Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
source_forward
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
source_forward
Commits
4f5be1c7
Commit
4f5be1c7
authored
Dec 18, 2017
by
zhiwei
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
添加验证来源中的渠道验证
parent
0f93a339
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
44 additions
and
0 deletions
+44
-0
src/main/java/com/zhiwei/source_forward/crawler/SourceForwardPageProcessor.java
+44
-0
No files found.
src/main/java/com/zhiwei/source_forward/crawler/SourceForwardPageProcessor.java
View file @
4f5be1c7
...
@@ -33,11 +33,15 @@ public class SourceForwardPageProcessor implements PageProcessor {
...
@@ -33,11 +33,15 @@ public class SourceForwardPageProcessor implements PageProcessor {
String
source
=
null
;
String
source
=
null
;
String
channel
=
"新闻"
;
String
channel
=
"新闻"
;
try
{
try
{
channel
=
verifyChannel
(
page
.
getUrl
().
get
());
if
(
channel
==
null
){
if
(
page
.
getStatusCode
()!=
404
){
if
(
page
.
getStatusCode
()!=
404
){
List
<
Node
>
nodeList
=
page
.
getHtml
().
getDocument
().
head
().
childNodes
();
List
<
Node
>
nodeList
=
page
.
getHtml
().
getDocument
().
head
().
childNodes
();
source
=
TreateData
.
matchSource
(
page
.
getUrl
().
get
(),
page
.
getHtml
().
toString
(),
sourceList
);
source
=
TreateData
.
matchSource
(
page
.
getUrl
().
get
(),
page
.
getHtml
().
toString
(),
sourceList
);
channel
=
TreateData
.
matchChannel
(
nodeList
);
channel
=
TreateData
.
matchChannel
(
nodeList
);
}
}
}
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
source
=
null
;
source
=
null
;
channel
=
"新闻"
;
channel
=
"新闻"
;
...
@@ -51,4 +55,44 @@ public class SourceForwardPageProcessor implements PageProcessor {
...
@@ -51,4 +55,44 @@ public class SourceForwardPageProcessor implements PageProcessor {
}
}
/**
* @Title: verifyChannel
* @author hero
* @Description: 根据链接验证文章频道
* @param @param url
* @param @return 设定文件
* @return String 返回类型
*/
private
static
String
verifyChannel
(
String
url
){
String
channel
=
null
;
if
(
url
.
contains
(
"news."
)){
channel
=
"新闻"
;
}
else
if
(
url
.
contains
(
"finance."
)
||
url
.
contains
(
"business."
)
||
url
.
contains
(
"money."
)){
channel
=
"财经"
;
}
else
if
(
url
.
contains
(
"tech."
)
||
url
.
contains
(
"it."
)){
channel
=
"科技"
;
}
else
if
(
url
.
contains
(
"sports."
)){
channel
=
"体育"
;
}
else
if
(
url
.
contains
(
"ent."
)
||
url
.
contains
(
"yule."
)){
channel
=
"娱乐"
;
}
else
if
(
url
.
contains
(
"auto."
)){
channel
=
"汽车"
;
}
else
if
(
url
.
contains
(
"fashion."
)){
channel
=
"时尚"
;
}
else
if
(
url
.
contains
(
"learning."
)
||
url
.
contains
(
"edu."
)){
channel
=
"教育"
;
}
else
if
(
url
.
contains
(
"baobao."
)){
channel
=
"母婴"
;
}
else
if
(
url
.
contains
(
"house."
)
||
url
.
contains
(
"leju."
)
||
url
.
contains
(
"focus."
)){
channel
=
"房产"
;
}
else
if
(
url
.
contains
(
"games."
)){
channel
=
"游戏"
;
}
return
channel
;
}
}
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment