Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
media_data_crawler
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
media_data_crawler
Commits
fd678cf3
Commit
fd678cf3
authored
Feb 13, 2019
by
yangchen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
知乎回答数获取增加
parent
040405fc
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
42 additions
and
14 deletions
+42
-14
src/main/java/com/zhiwei/media_data_crawler/crawler/ZhihuAnwserCrawlerParse.java
+42
-14
No files found.
src/main/java/com/zhiwei/media_data_crawler/crawler/ZhihuAnwserCrawlerParse.java
View file @
fd678cf3
package
com
.
zhiwei
.
media_data_crawler
.
crawler
;
import
java.net.Proxy
;
import
java.util.ArrayList
;
import
java.util.Date
;
import
java.util.HashMap
;
import
java.util.List
;
import
java.util.Map
;
import
org.jsoup.Jsoup
;
import
org.jsoup.nodes.Document
;
import
org.jsoup.select.Elements
;
import
org.slf4j.Logger
;
import
org.slf4j.LoggerFactory
;
import
com.alibaba.fastjson.JSONArray
;
import
com.alibaba.fastjson.JSONObject
;
import
com.zhiwei.crawler.core.HttpBoot
;
import
com.zhiwei.crawler.core.RequestUtils
;
import
com.zhiwei.media_data_crawler.entity.ZhihuAnswer
;
import
com.zhiwei.tools.timeparse.TimeParse
;
import
com.zhiwei.tools.tools.ZhiWeiTools
;
import
okhttp3.Response
;
import
org.jsoup.Jsoup
;
import
org.jsoup.nodes.Document
;
import
org.jsoup.select.Elements
;
import
java.net.Proxy
;
import
java.util.*
;
import
okhttp3.Response
;
/**
* 知乎评论采集
...
...
@@ -22,6 +29,26 @@ public class ZhihuAnwserCrawlerParse {
private
static
HttpBoot
httpBoot
=
new
HttpBoot
();
private
static
Logger
logger
=
LoggerFactory
.
getLogger
(
ZhihuAnwserCrawlerParse
.
class
);
/**
*
* @Description 知乎回答数获取
* @param url
* @param proxy
* @return
*/
public
static
int
getAnswerCount
(
String
url
,
Proxy
proxy
)
{
try
(
Response
response
=
httpBoot
.
syncCall
(
RequestUtils
.
wrapGet
(
url
),
proxy
)){
String
result
=
response
.
body
().
string
();
String
contntent
=
result
.
split
(
"itemProp=\"answerCount\" content=\""
)[
1
].
split
(
"\""
)[
0
];
return
Integer
.
parseInt
(
contntent
);
}
catch
(
Exception
e
)
{
logger
.
error
(
"数据解析错误"
);
}
return
-
1
;
}
/**
* 知乎回答采集
* @param url
...
...
@@ -216,13 +243,14 @@ public class ZhihuAnwserCrawlerParse {
public
static
void
main
(
String
[]
args
){
String
url
=
"https://www.zhihu.com/question/288128510"
;
Date
endDate
=
TimeParse
.
stringFormartDate
(
"2018-09-20 08:00:00"
);
try
{
getAnswerList
(
url
,
endDate
,
null
);
}
catch
(
Exception
e
){
e
.
fillInStackTrace
();
}
// String url = "https://www.zhihu.com/question/288128510";
// Date endDate = TimeParse.stringFormartDate("2018-09-20 08:00:00");
// try{
// getAnswerList(url,endDate, null);
// }catch (Exception e){
// e.fillInStackTrace();
// }
getAnswerCount
(
"https://www.zhihu.com/question/41539825"
,
null
);
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment