Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
source_forward
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
source_forward
Commits
29320d28
Commit
29320d28
authored
Jul 02, 2019
by
yangchen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
提升版本 增加匹配来源
parent
ab587b25
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
16 additions
and
3 deletions
+16
-3
pom.xml
+1
-1
src/main/java/com/zhiwei/source_forward/crawler/SourceForwardCrawler.java
+3
-1
src/main/java/com/zhiwei/source_forward/run/SourceForward.java
+1
-1
src/main/java/com/zhiwei/source_forward/util/MatchSource.java
+11
-0
src/main/resources/sourceList.txt
+0
-0
No files found.
pom.xml
View file @
29320d28
...
...
@@ -3,7 +3,7 @@
<modelVersion>
4.0.0
</modelVersion>
<groupId>
com.zhiwei
</groupId>
<artifactId>
source-forward
</artifactId>
<version>
0.1.
8
-SNAPSHOT
</version>
<version>
0.1.
9
-SNAPSHOT
</version>
<name>
source-forward
</name>
<description>
验证网媒的转发关系及链接的有效性(转发验证微信及自媒体匹配率不高)
</description>
...
...
src/main/java/com/zhiwei/source_forward/crawler/SourceForwardCrawler.java
View file @
29320d28
...
...
@@ -66,7 +66,9 @@ public class SourceForwardCrawler {
if
(
url
.
contains
(
"www.toutiao.com"
)){
headers
.
put
(
"referer"
,
url
);
}
if
(
url
.
contains
(
"china.prcfe.com"
))
{
url
=
"http://china.prcfe.com/e/extend/ShowSource/?id="
+
url
.
split
(
"/"
)[
url
.
split
(
"/"
).
length
-
1
].
split
(
"\\."
)[
0
];
}
Request
request
=
RequestUtils
.
wrapGet
(
url
,
headers
);
counter
.
add
();
httpBoot
.
asyncCall
(
request
,
ProxyHolder
.
NAT_HEAVY_PROXY
,
true
).
whenComplete
((
rs
,
ex
)
->
{
...
...
src/main/java/com/zhiwei/source_forward/run/SourceForward.java
View file @
29320d28
...
...
@@ -81,7 +81,7 @@ public class SourceForward {
public
static
void
main
(
String
[]
args
)
{
ProxyFactory
.
init
(
"zookeeper://192.168.0.36:2181"
,
"local"
,
GroupType
.
PROVIDER
);
List
<
String
>
urlList
=
new
ArrayList
<>();
urlList
.
add
(
"http://
industry.caijing.com.cn/20190423/4582310
.shtml"
);
urlList
.
add
(
"http://
stock.10jqka.com.cn/usstock/20190621/c612094454
.shtml"
);
List
<
SourceForwardBean
>
da
=
SourceForward
.
getSourceForward
(
urlList
);
for
(
SourceForwardBean
sfb
:
da
)
{
System
.
out
.
println
(
sfb
.
toString
());
...
...
src/main/java/com/zhiwei/source_forward/util/MatchSource.java
View file @
29320d28
...
...
@@ -184,6 +184,9 @@ public class MatchSource {
}
else
if
(
url
.
contains
(
"finance.ifeng.com"
)){
//单独处理凤凰网
source
=
document
.
select
(
"p.p_time"
).
select
(
"span"
).
select
(
"span"
).
select
(
"a"
).
text
();
if
(
Objects
.
isNull
(
source
)
||
source
.
length
()
<
1
)
{
source
=
html
.
split
(
"source\":\""
)[
1
].
split
(
"\""
)[
0
];
}
}
else
if
(
url
.
contains
(
"iphone.265g.com"
)){
//单独处理265G网
source
=
document
.
select
(
"div.article_info"
).
select
(
"span"
).
text
().
replaceAll
(
".*来源:|QQ群号.*"
,
""
);
...
...
@@ -299,6 +302,14 @@ public class MatchSource {
}
else
if
(
url
.
contains
(
"finance.youth.cn"
)){
//单独处理中国青年网
source
=
document
.
select
(
"span#source_baidu"
).
text
().
replaceAll
(
"来源:|作者.*"
,
""
);
}
else
if
(
url
.
contains
(
"china.com"
))
{
//中国金融商报
source
=
document
.
select
(
"#chan_newsInfo > a"
).
text
();
}
else
if
(
url
.
contains
(
"xw.qq.com"
))
{
//腾讯网客户端
source
=
document
.
select
(
"div.tpl_header_author"
).
text
();
}
else
if
(
url
.
contains
(
"china.prcfe.com"
))
{
source
=
html
.
split
(
"\""
)[
1
];
}
if
(
Objects
.
nonNull
(
source
)
&&
source
.
length
()
!=
0
)
{
...
...
src/main/resources/sourceList.txt
View file @
29320d28
This source diff could not be displayed because it is too large. You can
view the blob
instead.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment