Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
media_data_crawler
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
media_data_crawler
Commits
aacd8761
Commit
aacd8761
authored
Aug 05, 2019
by
yangchen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
修改部分代理使用 并升级版本
parent
ea9efe8f
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
9 additions
and
10 deletions
+9
-10
pom.xml
+1
-1
src/main/java/com/zhiwei/media_data_crawler/crawler/ZhihuCrawlerParse.java
+7
-8
src/main/java/com/zhiwei/media_data_crawler/data/DataCrawler.java
+1
-1
No files found.
pom.xml
View file @
aacd8761
...
...
@@ -2,7 +2,7 @@
<modelVersion>
4.0.0
</modelVersion>
<groupId>
com.zhiwei
</groupId>
<artifactId>
media_data_crawler
</artifactId>
<version>
0.1.
1
-SNAPSHOT
</version>
<version>
0.1.
2
-SNAPSHOT
</version>
<name>
media_data_crawler
</name>
<description>
网媒数据抓取,包含百度新闻、搜狗新闻、360新闻、知乎回答列表等
</description>
...
...
src/main/java/com/zhiwei/media_data_crawler/crawler/ZhihuCrawlerParse.java
View file @
aacd8761
...
...
@@ -15,6 +15,7 @@ import org.apache.logging.log4j.Logger;
import
com.alibaba.fastjson.JSONArray
;
import
com.alibaba.fastjson.JSONObject
;
import
com.zhiwei.crawler.core.HttpBoot
;
import
com.zhiwei.crawler.proxy.ProxyHolder
;
import
com.zhiwei.crawler.utils.RequestUtils
;
import
com.zhiwei.media_data_crawler.data.DataCrawler
;
import
com.zhiwei.media_data_crawler.entity.ZhiHuData
;
...
...
@@ -43,8 +44,8 @@ public class ZhihuCrawlerParse {
* @return List<TiebaData> 返回类型
*/
@SuppressWarnings
(
"unchecked"
)
public
static
List
<
ZhiHuData
>
getZhihuData
(
String
word
,
String
timeLimit
,
Proxy
proxy
,
Date
endTime
)
throws
Exception
{
List
<
ZhiHuData
>
list
=
new
ArrayList
<
ZhiHuData
>();
public
static
List
<
ZhiHuData
>
getZhihuData
(
String
word
,
String
timeLimit
,
Proxy
Holder
proxy
,
Date
endTime
)
throws
Exception
{
List
<
ZhiHuData
>
list
=
new
ArrayList
<>();
int
page
=
0
;
boolean
more
=
true
;
while
(
more
)
{
...
...
@@ -265,7 +266,7 @@ public class ZhihuCrawlerParse {
* @return
* @throws Exception
*/
private
static
String
downloadHtml
(
String
word
,
String
timeLimit
,
Proxy
proxy
,
private
static
String
downloadHtml
(
String
word
,
String
timeLimit
,
Proxy
Holder
proxy
,
int
page
)
throws
Exception
{
// 获取通用请求头
Map
<
String
,
String
>
headerMap
=
HeaderTool
.
getCommonHead
();
...
...
@@ -300,9 +301,9 @@ public class ZhihuCrawlerParse {
* @param @throws Exception 设定文件
* @return Map<String,Object> 返回类型
*/
private
static
Map
<
String
,
Object
>
analysisData
(
String
htmlBody
,
Proxy
proxy
,
String
word
,
Date
endTime
)
throws
Exception
{
Map
<
String
,
Object
>
resultMap
=
new
HashMap
<
String
,
Object
>();
List
<
ZhiHuData
>
list
=
new
ArrayList
<
ZhiHuData
>();
private
static
Map
<
String
,
Object
>
analysisData
(
String
htmlBody
,
Proxy
Holder
proxy
,
String
word
,
Date
endTime
)
throws
Exception
{
Map
<
String
,
Object
>
resultMap
=
new
HashMap
<>();
List
<
ZhiHuData
>
list
=
new
ArrayList
<>();
boolean
more
=
true
;
try
{
JSONArray
dataJson
=
JSONObject
.
parseObject
(
htmlBody
).
getJSONArray
(
"data"
);
...
...
@@ -351,7 +352,6 @@ public class ZhihuCrawlerParse {
}
}
catch
(
Exception
e
)
{
System
.
out
.
println
(
"======="
+
objectJson
);
continue
;
}
}
}
else
{
...
...
@@ -359,7 +359,6 @@ public class ZhihuCrawlerParse {
}
}
catch
(
Exception
e
)
{
e
.
printStackTrace
();
System
.
out
.
println
();
more
=
false
;
}
...
...
src/main/java/com/zhiwei/media_data_crawler/data/DataCrawler.java
View file @
aacd8761
...
...
@@ -428,7 +428,7 @@ public class DataCrawler {
* @return
* @throws Exception
*/
public
static
List
<
ZhiHuData
>
getZhihuByWord
(
String
word
,
String
timeLimit
,
Date
endDate
,
Proxy
proxy
)
throws
Exception
{
public
static
List
<
ZhiHuData
>
getZhihuByWord
(
String
word
,
String
timeLimit
,
Date
endDate
,
Proxy
Holder
proxy
)
throws
Exception
{
try
{
return
ZhihuCrawlerParse
.
getZhihuData
(
word
,
timeLimit
,
proxy
,
endDate
);
}
catch
(
Exception
e
){
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment