Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
source_forward
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
source_forward
Commits
4fafcc87
Commit
4fafcc87
authored
Mar 26, 2018
by
zhiwei
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
1.添加微信文章原创识别
parent
4ac6afdf
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
0 deletions
+12
-0
src/main/java/com/zhiwei/source_forward/crawler/SourceForwardPageProcessor.java
+12
-0
No files found.
src/main/java/com/zhiwei/source_forward/crawler/SourceForwardPageProcessor.java
View file @
4fafcc87
...
@@ -3,6 +3,8 @@ package com.zhiwei.source_forward.crawler;
...
@@ -3,6 +3,8 @@ package com.zhiwei.source_forward.crawler;
import
java.util.HashMap
;
import
java.util.HashMap
;
import
java.util.List
;
import
java.util.List
;
import
java.util.Map
;
import
java.util.Map
;
import
org.jsoup.nodes.Document
;
import
org.jsoup.nodes.Node
;
import
org.jsoup.nodes.Node
;
import
com.zhiwei.source_forward.util.SourceData
;
import
com.zhiwei.source_forward.util.SourceData
;
import
com.zhiwei.source_forward.util.TreateData
;
import
com.zhiwei.source_forward.util.TreateData
;
...
@@ -40,6 +42,16 @@ public class SourceForwardPageProcessor implements PageProcessor {
...
@@ -40,6 +42,16 @@ public class SourceForwardPageProcessor implements PageProcessor {
channel
=
TreateData
.
matchChannel
(
nodeList
);
channel
=
TreateData
.
matchChannel
(
nodeList
);
}
}
source
=
TreateData
.
matchSource
(
page
.
getUrl
().
get
(),
page
.
getHtml
().
toString
(),
sourceList
);
source
=
TreateData
.
matchSource
(
page
.
getUrl
().
get
(),
page
.
getHtml
().
toString
(),
sourceList
);
if
(
page
.
getUrl
().
get
().
contains
(
"mp.weixin.qq.com"
)){
String
isforward
=
"未知"
;
Document
document
=
page
.
getHtml
().
getDocument
();
if
(
document
.
select
(
"div#meta_content"
).
select
(
"span.rich_media_meta meta_original_tag"
)!=
null
&&
!
""
.
equals
(
document
.
select
(
"div#meta_content"
).
select
(
"span.rich_media_meta meta_original_tag"
))){
isforward
=
document
.
select
(
"div#meta_content"
).
select
(
"span.rich_media_meta meta_original_tag"
).
text
();
data
.
put
(
"isforward"
,
isforward
);
}
}
}
}
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
source
=
null
;
source
=
null
;
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment