Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
source_forward
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
source_forward
Commits
d705de1f
Commit
d705de1f
authored
Jul 08, 2021
by
chenweiyang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
是否删除调整
parent
364aa66a
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
6 additions
and
3 deletions
+6
-3
src/main/java/com/zhiwei/source_forward/crawler/UrlLiveCrawler.java
+5
-2
src/main/java/com/zhiwei/source_forward/run/URLLive.java
+1
-1
No files found.
src/main/java/com/zhiwei/source_forward/crawler/UrlLiveCrawler.java
View file @
d705de1f
...
...
@@ -216,6 +216,9 @@ public class UrlLiveCrawler {
if
(
Objects
.
isNull
(
title
)
||
title
.
isEmpty
())
{
title
=
doc
.
select
(
"h2"
).
text
();
}
if
(
Objects
.
isNull
(
title
)
||
title
.
isEmpty
())
{
title
=
doc
.
select
(
"div.weui-msg__text-area > h3"
).
text
();
}
// 获取title
Matcher
ma5
=
Pattern
.
compile
(
"var msg_title = \'(.*)\'"
)
.
matcher
(
result
);
...
...
@@ -224,7 +227,7 @@ public class UrlLiveCrawler {
}
if
(
Objects
.
isNull
(
title
)
||
title
.
isEmpty
())
{
if
(
result
.
contains
(
"此帐号已被屏蔽, 内容无法查看"
)
||
result
.
contains
(
"该公众号已迁移"
)
||
result
.
contains
(
"此帐号已自主注销,内容无法查看"
)
||
result
.
contains
(
"此帐号处于帐号迁移流程中"
)
||
result
.
contains
(
"该内容已被发布者删除"
))
{
||
result
.
contains
(
"此帐号处于帐号迁移流程中"
)
||
result
.
contains
(
"该内容已被发布者删除"
)
||
result
.
contains
(
"此内容被投诉且经审核涉嫌侵权"
)
)
{
title
=
"网页已删除"
;
}
}
...
...
@@ -324,7 +327,7 @@ public class UrlLiveCrawler {
,
"百度新闻——全球最大的中文新闻平台"
,
"以上文章由以下机构判定为不实信息"
,
"该公众号已迁移"
,
"财经网-CAIJING.COM.CN"
,
"蚂蚁资讯"
,
"参数错误"
,
"时尚头条_YOKA时尚网"
,
"该文章已经被删除"
,
"网易"
,
"链接已过期"
,
"找不到页面"
,
"今晚网"
,
"该文章已被删除"
,
"该回答已被删除-知乎"
,
"资源不存在"
,
"文章未找到"
,
"UC头条"
,
"该内容暂无法显示"
,
"手机搜狐网"
);
,
"UC头条"
,
"该内容暂无法显示"
,
"手机搜狐网"
,
"此内容被投诉且经审核涉嫌侵权,无法查看。"
);
List
<
String
>
cList
=
Arrays
.
asList
(
"提示信息-"
,
"此内容因违规无法查看"
,
"微信公众号不存在"
,
"此内容被投诉且经审核涉嫌侵权,无法查看"
,
"thepageyourequestedwasnotfound"
,
"未知错误"
...
...
src/main/java/com/zhiwei/source_forward/run/URLLive.java
View file @
d705de1f
...
...
@@ -72,7 +72,7 @@ public class URLLive {
public
static
void
main
(
String
[]
args
)
{
ProxyInit
.
initProxy
();
List
<
String
>
urlList
=
new
ArrayList
<>();
urlList
.
add
(
"https://
www.toutiao.com/a6982350814614405670/
"
);
urlList
.
add
(
"https://
mp.weixin.qq.com/s/YLlXGwlSugJpXTIqrLgPPw
"
);
// urlList.add("http://www.yidianzixun.com/article/0PYO4Gbh");
List
<
UrlLiveBean
>
u
=
URLLive
.
verificationURLLive
(
urlList
);
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment