Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
media_data_crawler
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
media_data_crawler
Commits
1325c572
Commit
1325c572
authored
Apr 18, 2018
by
zhiwei
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
修复搜狗知乎采集时回答数处理问题
parent
2ca74931
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
15 additions
and
3 deletions
+15
-3
src/main/java/com/zhiwei/media_data_crawler/crawler/SougouZhihuCrawlerParse.java
+15
-3
No files found.
src/main/java/com/zhiwei/media_data_crawler/crawler/SougouZhihuCrawlerParse.java
View file @
1325c572
...
@@ -178,10 +178,19 @@ public class SougouZhihuCrawlerParse extends HttpClientTemplateOK {
...
@@ -178,10 +178,19 @@ public class SougouZhihuCrawlerParse extends HttpClientTemplateOK {
ZhiHuData
zhihu
=
null
;
ZhiHuData
zhihu
=
null
;
if
(
type
.
contains
(
"文章"
)){
if
(
type
.
contains
(
"文章"
)){
String
source
=
element
.
select
(
"p.about-answer"
).
select
(
"cite"
).
text
();
String
source
=
element
.
select
(
"p.about-answer"
).
select
(
"cite"
).
text
();
Integer
attitudes_count
=
Integer
.
valueOf
(
element
.
select
(
"p.about-answer"
).
select
(
"span.count"
).
text
().
replaceAll
(
"个赞"
,
""
));
String
attitudesCount
=
element
.
select
(
"p.about-answer"
).
select
(
"span.count"
).
text
().
replaceAll
(
"个赞"
,
""
);
if
(
attitudesCount
.
contains
(
"k"
)){
attitudesCount
=
attitudesCount
.
split
(
"k"
)[
0
]+
"000"
;
}
Integer
attitudes_count
=
Integer
.
valueOf
(
attitudesCount
);
Integer
comment_count
=
0
;
Integer
comment_count
=
0
;
if
(!
""
.
equals
(
answerText
.
replace
(
"个评论"
,
""
).
trim
())){
if
(!
""
.
equals
(
answerText
.
replace
(
"个评论"
,
""
).
trim
())){
comment_count
=
Integer
.
valueOf
(
answerText
.
replace
(
"个评论"
,
""
).
trim
());
String
commentCount
=
answerText
.
replace
(
"个评论"
,
""
).
trim
();
if
(
commentCount
.
contains
(
"k"
)){
commentCount
=
commentCount
.
split
(
"k"
)[
0
]+
"000"
;
}
comment_count
=
Integer
.
valueOf
(
commentCount
);
}
}
zhihu
=
new
ZhiHuData
(
link
,
title
,
pt
,
type
,
null
,
source
,
null
,
attitudes_count
,
null
,
comment_count
,
word
);
zhihu
=
new
ZhiHuData
(
link
,
title
,
pt
,
type
,
null
,
source
,
null
,
attitudes_count
,
null
,
comment_count
,
word
);
zhihu
=
analysisZhihuArticle
(
link
,
proxy
,
zhihu
);
zhihu
=
analysisZhihuArticle
(
link
,
proxy
,
zhihu
);
...
@@ -189,7 +198,10 @@ public class SougouZhihuCrawlerParse extends HttpClientTemplateOK {
...
@@ -189,7 +198,10 @@ public class SougouZhihuCrawlerParse extends HttpClientTemplateOK {
Integer
answer_count
=
0
;
Integer
answer_count
=
0
;
answerText
=
answerText
.
replace
(
"个回答"
,
""
).
trim
();
answerText
=
answerText
.
replace
(
"个回答"
,
""
).
trim
();
if
(
answerText
!=
null
&&
!
""
.
equals
(
answerText
)){
if
(
answerText
!=
null
&&
!
""
.
equals
(
answerText
)){
answer_count
=
Integer
.
valueOf
(
answer_count
);
if
(
answerText
.
contains
(
"k"
)){
answerText
=
answerText
.
split
(
"k"
)[
0
]+
"000"
;
}
answer_count
=
Integer
.
valueOf
(
answerText
);
}
}
zhihu
=
new
ZhiHuData
(
link
,
title
,
pt
,
type
,
null
,
null
,
null
,
null
,
answer_count
,
null
,
word
);
zhihu
=
new
ZhiHuData
(
link
,
title
,
pt
,
type
,
null
,
null
,
null
,
null
,
answer_count
,
null
,
word
);
zhihu
=
analysisZhihuAnswer
(
link
,
proxy
,
zhihu
);
zhihu
=
analysisZhihuAnswer
(
link
,
proxy
,
zhihu
);
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment