Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
source_forward
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
source_forward
Commits
b13567ee
Commit
b13567ee
authored
May 15, 2019
by
yangchen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
添加来源判断
parent
eba74706
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
24 additions
and
13 deletions
+24
-13
src/main/java/com/zhiwei/source_forward/run/SourceForward.java
+9
-7
src/main/java/com/zhiwei/source_forward/util/MatchSource.java
+13
-6
src/main/resources/sourceList.txt
+2
-0
No files found.
src/main/java/com/zhiwei/source_forward/run/SourceForward.java
View file @
b13567ee
...
...
@@ -9,6 +9,8 @@ import java.util.Map.Entry;
import
org.apache.logging.log4j.LogManager
;
import
org.apache.logging.log4j.Logger
;
import
com.zhiwei.common.config.GroupType
;
import
com.zhiwei.crawler.proxy.ProxyFactory
;
import
com.zhiwei.source_forward.bean.SourceForwardBean
;
import
com.zhiwei.source_forward.bean.SourceForwardBean.Attribution
;
import
com.zhiwei.source_forward.crawler.SourceForwardCrawler
;
...
...
@@ -77,13 +79,13 @@ public class SourceForward {
}
public
static
void
main
(
String
[]
args
)
{
//
ProxyFactory.init("zookeeper://192.168.0.36:2181","local",GroupType.PROVIDER);
//
List<String> urlList = new ArrayList<>();
// urlList.add("https://www.toutiao.com/a6634320415839748621
");
//
List<SourceForwardBean> da = SourceForward.getSourceForward(urlList);
//
for(SourceForwardBean sfb : da) {
//
System.out.println(sfb.toString());
//
}
ProxyFactory
.
init
(
"zookeeper://192.168.0.36:2181"
,
"local"
,
GroupType
.
PROVIDER
);
List
<
String
>
urlList
=
new
ArrayList
<>();
urlList
.
add
(
"http://www.northnews.cn/2019/0419/3080909.shtml
"
);
List
<
SourceForwardBean
>
da
=
SourceForward
.
getSourceForward
(
urlList
);
for
(
SourceForwardBean
sfb
:
da
)
{
System
.
out
.
println
(
sfb
.
toString
());
}
}
static
class
SourceForwardCrawlerThread
extends
Thread
{
...
...
src/main/java/com/zhiwei/source_forward/util/MatchSource.java
View file @
b13567ee
...
...
@@ -39,7 +39,7 @@ public class MatchSource {
/**
* @Title: findURLs
* @author hero
* @Description:
TODO
(验证并匹配数据)
* @Description: (验证并匹配数据)
* @param @param
* s
* @param @param
...
...
@@ -91,17 +91,24 @@ public class MatchSource {
}
else
{
source
=
html
.
split
(
"source\":\""
)[
1
].
split
(
"\",\""
)[
0
];
}
}
else
if
(
url
.
contains
(
"tech.china.com"
)){
//中华网科技
source
=
document
.
select
(
"#chan_newsInfo"
).
text
().
split
(
"来源:"
)[
1
];
}
else
if
(
url
.
contains
(
"caijing.com.cn"
)){
//财经网产经
source
=
document
.
select
(
"#source_baidu"
).
text
();
}
else
{
//其他网站处理
source
=
mathchOtherSource
(
html
,
htmlBody
,
sourceList
);
}
if
(
source
!=
null
){
//验证来源
for
(
String
sourceMatch
:
sourceList
)
{
if
(
source
.
contains
(
sourceMatch
))
{
return
sourceMatch
;
}
}
// for (String sourceMatch : sourceList) {
// if (source.contains(sourceMatch)) {
// return sourceMatch;
// }
// }
return
source
;
}
}
catch
(
Exception
e
)
{
e
.
printStackTrace
();
...
...
src/main/resources/sourceList.txt
View file @
b13567ee
...
...
@@ -3052,3 +3052,4 @@ ZOL中关村在线
走进中关村
最高人民法院网
最高人民检察院
今日湖北
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment