Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
source_forward
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
source_forward
Commits
6e20a1f6
Commit
6e20a1f6
authored
Jul 08, 2021
by
chenweiyang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
是否删除错误处理
parent
72bdcd09
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
8 deletions
+11
-8
src/main/java/com/zhiwei/source_forward/crawler/UrlLiveCrawler.java
+11
-8
No files found.
src/main/java/com/zhiwei/source_forward/crawler/UrlLiveCrawler.java
View file @
6e20a1f6
...
@@ -222,9 +222,11 @@ public class UrlLiveCrawler {
...
@@ -222,9 +222,11 @@ public class UrlLiveCrawler {
if
(
ma5
.
find
())
{
if
(
ma5
.
find
())
{
title
=
ma5
.
group
(
1
).
replaceAll
(
" "
,
" "
).
trim
();
title
=
ma5
.
group
(
1
).
replaceAll
(
" "
,
" "
).
trim
();
}
}
if
(
result
.
contains
(
"此帐号已被屏蔽, 内容无法查看"
)
||
result
.
contains
(
"该公众号已迁移"
)
||
result
.
contains
(
"此帐号已自主注销,内容无法查看"
)
if
(
Objects
.
isNull
(
title
)
||
title
.
isEmpty
())
{
||
result
.
contains
(
"此帐号处于帐号迁移流程中"
)
||
result
.
contains
(
"该内容已被发布者删除"
))
{
if
(
result
.
contains
(
"此帐号已被屏蔽, 内容无法查看"
)
||
result
.
contains
(
"该公众号已迁移"
)
||
result
.
contains
(
"此帐号已自主注销,内容无法查看"
)
title
=
"网页已删除"
;
||
result
.
contains
(
"此帐号处于帐号迁移流程中"
)
||
result
.
contains
(
"该内容已被发布者删除"
))
{
title
=
"网页已删除"
;
}
}
}
}
else
if
(
url
.
contains
(
"kuaibao"
)){
}
else
if
(
url
.
contains
(
"kuaibao"
)){
title
=
doc
.
select
(
"p.title"
).
text
().
replaceAll
(
" "
,
""
);
title
=
doc
.
select
(
"p.title"
).
text
().
replaceAll
(
" "
,
""
);
...
@@ -263,6 +265,12 @@ public class UrlLiveCrawler {
...
@@ -263,6 +265,12 @@ public class UrlLiveCrawler {
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
logger
.
error
(
" uc 数据 json 转换失败"
,
e
);
logger
.
error
(
" uc 数据 json 转换失败"
,
e
);
}
}
}
else
if
(
attr
.
getAttr
().
toString
().
contains
(
"toutiao.com"
))
{
if
(
result
.
contains
(
"\"success\":false"
))
{
title
=
"网页已删除"
;
}
else
{
title
=
String
.
valueOf
(
JSONPath
.
read
(
result
,
"$..title"
));
}
}
}
//若title 为拿到 用 此方法
//若title 为拿到 用 此方法
...
@@ -280,11 +288,6 @@ public class UrlLiveCrawler {
...
@@ -280,11 +288,6 @@ public class UrlLiveCrawler {
title
=
doc
.
select
(
"h1"
).
text
().
replaceAll
(
" "
,
""
);
title
=
doc
.
select
(
"h1"
).
text
().
replaceAll
(
" "
,
""
);
}
}
if
(
result
.
contains
(
"\"success\":false"
)
&&
attr
.
getAttr
().
toString
().
contains
(
"toutiao.com"
))
{
title
=
"网页已删除"
;
}
else
{
title
=
String
.
valueOf
(
JSONPath
.
read
(
result
,
"$..title"
));
}
//若title 为拿到 用 此方法 无法获取标题不进行程序迷惑性判断
//若title 为拿到 用 此方法 无法获取标题不进行程序迷惑性判断
// if(Objects.isNull(title) || title.length() < 1 || result.length() < 200) {
// if(Objects.isNull(title) || title.length() < 1 || result.length() < 200) {
// title = "网页已删除";
// title = "网页已删除";
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment