Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
source_forward
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
source_forward
Commits
4dac5870
Commit
4dac5870
authored
Jan 16, 2018
by
zhiwei
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
1.添加自媒体号中的帐号来源采集中的一点资讯匹配规则
parent
cd456869
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
8 additions
and
8 deletions
+8
-8
src/main/java/com/zhiwei/source_forward/crawler/MediaSelfSourcePageProcessor.java
+0
-3
src/main/java/com/zhiwei/source_forward/util/TreateData.java
+8
-5
No files found.
src/main/java/com/zhiwei/source_forward/crawler/MediaSelfSourcePageProcessor.java
View file @
4dac5870
package
com
.
zhiwei
.
source_forward
.
crawler
;
import
java.util.HashMap
;
import
java.util.List
;
import
java.util.Map
;
import
org.jsoup.nodes.Node
;
import
com.zhiwei.source_forward.util.SourceData
;
import
com.zhiwei.source_forward.util.TreateData
;
import
us.codecraft.webmagic.Page
;
import
us.codecraft.webmagic.Site
;
...
...
src/main/java/com/zhiwei/source_forward/util/TreateData.java
View file @
4dac5870
...
...
@@ -123,7 +123,7 @@ public class TreateData {
if
(
url
.
contains
(
"toutiao.com"
)){
//今日头条帐号匹配
if
(
html
.
contains
(
" source: '"
)){
source
=
"今日头条-"
+
html
.
split
(
"
source: '"
)[
1
].
split
(
"',"
)[
0
];
source
=
"今日头条-"
+
html
.
split
(
"source: '"
)[
1
].
split
(
"',"
)[
0
];
}
}
else
if
(
url
.
contains
(
"sohu.com"
)){
//搜狐自媒体号
...
...
@@ -144,6 +144,13 @@ public class TreateData {
}
else
if
(
url
.
contains
(
"baijia.baidu.com"
)){
//百度百家
source
=
"百家号-"
+
document
.
select
(
"section.info"
).
select
(
"span.author"
).
text
();
}
else
if
(
url
.
contains
(
"yidianzixun.com"
)){
//一点资讯
if
(
html
.
contains
(
"related_wemedia"
)){
source
=
"一点号-"
+
html
.
split
(
"media_name\":\""
)[
1
].
split
(
"\",\""
)[
0
];
}
else
{
source
=
html
.
split
(
"source\":\""
)[
1
].
split
(
"\",\""
)[
0
];
}
}
return
source
;
}
catch
(
Exception
e
)
{
...
...
@@ -151,10 +158,6 @@ public class TreateData {
}
}
/**
* @Title: matchChannel
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment