Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
media_data_crawler
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
media_data_crawler
Commits
040405fc
Commit
040405fc
authored
Feb 02, 2019
by
[zhangzhiwei]
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
修复知乎根据用户采集用户回答死循环问题
parent
c694f0ae
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
2 deletions
+11
-2
src/main/java/com/zhiwei/media_data_crawler/crawler/ZhihuUserAnswerCrawlerParse.java
+11
-2
No files found.
src/main/java/com/zhiwei/media_data_crawler/crawler/ZhihuUserAnswerCrawlerParse.java
View file @
040405fc
...
@@ -26,19 +26,24 @@ public class ZhihuUserAnswerCrawlerParse {
...
@@ -26,19 +26,24 @@ public class ZhihuUserAnswerCrawlerParse {
private
static
HttpBoot
httpBoot
=
new
HttpBoot
();
private
static
HttpBoot
httpBoot
=
new
HttpBoot
();
public
static
List
<
ZhihuAnswer
>
getData
(
String
userId
,
ProxyHolder
proxy
)
{
public
static
List
<
ZhihuAnswer
>
getData
(
String
userId
,
ProxyHolder
proxy
)
{
String
url
=
"https://www.zhihu.com/api/v4/members/"
+
userId
+
"/answers?include=data%5B*%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Cmark_infos%2Ccreated_time%2Cupdated_time%2Creview_info%2Cquestion%2Cexcerpt%2Cis_labeled%2Clabel_info%2Crelationship.is_authorized%2Cvoting%2Cis_author%2Cis_thanked%2Cis_nothelp%3Bdata%5B*%5D.author.badge%5B%3F(type%3Dbest_answerer)%5D.topics&limit=20&sort_by=created&offset="
;
String
url
=
"https://www.zhihu.com/api/v4/members/"
+
userId
+
"/answers?include=data%5B*%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Cmark_infos%2Ccreated_time%2Cupdated_time%2Creview_info%2Cquestion%2Cexcerpt%2Cis_labeled%2Clabel_info%2Crelationship.is_authorized%2Cvoting%2Cis_author%2Cis_thanked%2Cis_nothelp%3Bdata%5B*%5D.author.badge%5B%3F(type%3Dbest_answerer)%5D.topics&limit=20&sort_by=created&offset="
;
int
page
=
0
;
int
page
=
0
;
List
<
ZhihuAnswer
>
dataList
=
new
ArrayList
<>();
List
<
ZhihuAnswer
>
dataList
=
new
ArrayList
<>();
Map
<
String
,
Object
>
headers
=
new
HashMap
<>();
Map
<
String
,
Object
>
headers
=
new
HashMap
<>();
// headers.put("referer", "https://www.zhihu.com/people/"+userId+"/answers");
// headers.put("referer", "https://www.zhihu.com/people/"+userId+"/answers");
// headers.put("user-agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36");
// headers.put("user-agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36");
// headers.put("cookie", "
tgw_l7_route=116a747939468d99065d12a386ab1c5f; _xsrf=gn2oQ7N4G6yGOny4hc3T1TRr4kPOF4ij
");
// headers.put("cookie", "
_zap=37e196ce-6bf6-4680-9c40-a4b3dea72a47; _xsrf=g11of1WpkFPUYCJ88GRAlpty8bMnntT0; d_c0=\"ALDmEMw9Fw6PTsQcBCjppwDT8MbPGyQLkuo=|1534857872\"; __utma=51854390.1770583360.1534857893.1534857893.1534857893.1; __utmz=51854390.1534857893.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utmv=51854390.000--|3=entry_date=20180821=1; z_c0=\"2|1:0|10:1545787952|4:z_c0|92:Mi4xbFFnaEFBQUFBQUFBc09ZUXpEMFhEaVlBQUFCZ0FsVk5NQ2dRWFFCUGxTLS1CczBSZWdDUzgyTFZOTmd4WHJISFR3|93b1755c91416a906602a708b0a451f7748cc1ff6fe5ee318fe2e7e15d30f101\"; tst=r; q_c1=3db855c272674e60bc301eae9948df45|1547635145000|1534857872000; tgw_l7_route=1b9b7363f02f3a5519d03bdf813bc914
");
while
(
true
)
{
while
(
true
)
{
int
count
=
1
;
int
count
=
1
;
try
(
Response
response
=
httpBoot
.
syncCall
(
RequestUtils
.
wrapGet
(
url
+
page
,
headers
),
proxy
)){
String
urlNewww
=
url
+
page
;
// System.out.println("urlNew================"+urlNewww);
try
(
Response
response
=
httpBoot
.
syncCall
(
RequestUtils
.
wrapGet
(
urlNewww
,
headers
),
proxy
)){
String
result
=
response
.
body
().
string
();
String
result
=
response
.
body
().
string
();
JSONObject
json
=
JSONObject
.
parseObject
(
result
);
JSONObject
json
=
JSONObject
.
parseObject
(
result
);
JSONArray
jsonArray
=
json
.
getJSONArray
(
"data"
);
JSONArray
jsonArray
=
json
.
getJSONArray
(
"data"
);
if
(
jsonArray
!=
null
&&
jsonArray
.
size
()>
0
){
for
(
int
i
=
0
;
i
<
jsonArray
.
size
();
i
++)
{
for
(
int
i
=
0
;
i
<
jsonArray
.
size
();
i
++)
{
JSONObject
data
=
jsonArray
.
getJSONObject
(
i
);
JSONObject
data
=
jsonArray
.
getJSONObject
(
i
);
ZhihuAnswer
za
=
new
ZhihuAnswer
();
ZhihuAnswer
za
=
new
ZhihuAnswer
();
...
@@ -52,6 +57,10 @@ public class ZhihuUserAnswerCrawlerParse {
...
@@ -52,6 +57,10 @@ public class ZhihuUserAnswerCrawlerParse {
za
.
setComment_count
(
data
.
getInteger
(
"comment_count"
));
za
.
setComment_count
(
data
.
getInteger
(
"comment_count"
));
dataList
.
add
(
za
);
dataList
.
add
(
za
);
}
}
}
else
{
break
;
}
int
total
=
json
.
getJSONObject
(
"paging"
).
getInteger
(
"totals"
);
int
total
=
json
.
getJSONObject
(
"paging"
).
getInteger
(
"totals"
);
logger
.
info
(
" 知乎用户回答采集 {} 采集第 {} 条 ,一共采集到 {} 条 ,总条数 {}"
,
userId
,
page
,
dataList
.
size
(),
total
);
logger
.
info
(
" 知乎用户回答采集 {} 采集第 {} 条 ,一共采集到 {} 条 ,总条数 {}"
,
userId
,
page
,
dataList
.
size
(),
total
);
if
(
dataList
.
size
()
>
total
||
page
>
total
)
{
if
(
dataList
.
size
()
>
total
||
page
>
total
)
{
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment