Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
media_data_crawler
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
media_data_crawler
Commits
af2398c5
Commit
af2398c5
authored
Sep 19, 2018
by
yangchen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
百度获取量失败
parent
a7d988a7
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
4 additions
and
2 deletions
+4
-2
pom.xml
+1
-1
src/main/java/com/zhiwei/media_data_crawler/crawler/BaiduNewsCrawlerParse.java
+1
-1
src/main/java/com/zhiwei/media_data_crawler/crawler/SougouNewsCrawlerParse.java
+1
-0
src/main/java/com/zhiwei/media_data_crawler/data/DataCrawler.java
+1
-0
No files found.
pom.xml
View file @
af2398c5
...
@@ -2,7 +2,7 @@
...
@@ -2,7 +2,7 @@
<modelVersion>
4.0.0
</modelVersion>
<modelVersion>
4.0.0
</modelVersion>
<groupId>
com.zhiwei
</groupId>
<groupId>
com.zhiwei
</groupId>
<artifactId>
media_data_crawler
</artifactId>
<artifactId>
media_data_crawler
</artifactId>
<version>
0.0.
2
-SNAPSHOT
</version>
<version>
0.0.
3
-SNAPSHOT
</version>
<name>
media_data_crawler
</name>
<name>
media_data_crawler
</name>
<description>
网媒数据抓取,包含百度新闻、搜狗新闻、360新闻等
</description>
<description>
网媒数据抓取,包含百度新闻、搜狗新闻、360新闻等
</description>
...
...
src/main/java/com/zhiwei/media_data_crawler/crawler/BaiduNewsCrawlerParse.java
View file @
af2398c5
...
@@ -271,7 +271,7 @@ public class BaiduNewsCrawlerParse {
...
@@ -271,7 +271,7 @@ public class BaiduNewsCrawlerParse {
for
(
int
i
=
1
;
i
<=
3
;
i
++)
{
for
(
int
i
=
1
;
i
<=
3
;
i
++)
{
try
{
try
{
Response
response
=
HttpBoot
.
syncCall
(
RequestUtils
.
wrapGet
(
url
,
headerMap
),
proxy
,
false
);
Response
response
=
HttpBoot
.
syncCall
(
RequestUtils
.
wrapGet
(
url
,
headerMap
),
proxy
,
false
);
return
response
.
body
().
toS
tring
();
return
response
.
body
().
s
tring
();
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
logger
.
error
(
"获取数据时出现问题,问题为:{}"
,
e
.
fillInStackTrace
());
logger
.
error
(
"获取数据时出现问题,问题为:{}"
,
e
.
fillInStackTrace
());
if
(
i
==
3
){
if
(
i
==
3
){
...
...
src/main/java/com/zhiwei/media_data_crawler/crawler/SougouNewsCrawlerParse.java
View file @
af2398c5
...
@@ -61,6 +61,7 @@ public class SougouNewsCrawlerParse {
...
@@ -61,6 +61,7 @@ public class SougouNewsCrawlerParse {
more
=
false
;
more
=
false
;
}
}
page
++;
page
++;
logger
.
info
(
"采集到 {} 页 采集的数据量为 {}"
,
page
,
list
.
size
());
if
(
DataCrawler
.
sleepTime
==
null
){
if
(
DataCrawler
.
sleepTime
==
null
){
ZhiWeiTools
.
sleep
(
5000
);
ZhiWeiTools
.
sleep
(
5000
);
}
}
...
...
src/main/java/com/zhiwei/media_data_crawler/data/DataCrawler.java
View file @
af2398c5
...
@@ -154,6 +154,7 @@ public class DataCrawler {
...
@@ -154,6 +154,7 @@ public class DataCrawler {
*/
*/
public
static
List
<
NewsData
>
getSougouNewsData
(
String
word
,
Proxy
proxy
)
{
public
static
List
<
NewsData
>
getSougouNewsData
(
String
word
,
Proxy
proxy
)
{
try
{
try
{
System
.
out
.
println
(
"开始采集sogou"
);
return
SougouNewsCrawlerParse
.
getSougouNewsData
(
word
,
proxy
);
return
SougouNewsCrawlerParse
.
getSougouNewsData
(
word
,
proxy
);
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
e
.
printStackTrace
();
e
.
printStackTrace
();
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment