Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
media_data_crawler
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
media_data_crawler
Commits
b8eef493
Commit
b8eef493
authored
Aug 11, 2018
by
zhiwei
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'master' of
http://git.zhiweidata.top/zhangzhiwei/media_data_crawler.git
parents
f0ddce27
59dd3601
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
2 deletions
+3
-2
src/main/java/com/zhiwei/media_data_crawler/crawler/BaiduNewsCrawlerParse.java
+0
-1
src/main/java/com/zhiwei/media_data_crawler/crawler/BaiduTiebaCrawlerParse.java
+3
-1
No files found.
src/main/java/com/zhiwei/media_data_crawler/crawler/BaiduNewsCrawlerParse.java
View file @
b8eef493
...
@@ -93,7 +93,6 @@ public class BaiduNewsCrawlerParse extends HttpClientTemplateOK {
...
@@ -93,7 +93,6 @@ public class BaiduNewsCrawlerParse extends HttpClientTemplateOK {
public
static
int
getBaiduNewsCount
(
String
word
,
String
startTime
,
String
endTime
,
Proxy
proxy
,
String
cookie
)
throws
Exception
{
public
static
int
getBaiduNewsCount
(
String
word
,
String
startTime
,
String
endTime
,
Proxy
proxy
,
String
cookie
)
throws
Exception
{
try
{
try
{
String
result
=
downloadHtml
(
word
,
startTime
,
endTime
,
proxy
,
"newsdy"
,
1
,
cookie
);
String
result
=
downloadHtml
(
word
,
startTime
,
endTime
,
proxy
,
"newsdy"
,
1
,
cookie
);
System
.
out
.
println
(
result
);
String
s
=
result
.
split
(
"找到相关新闻"
)[
1
];
String
s
=
result
.
split
(
"找到相关新闻"
)[
1
];
String
s1
=
s
.
split
(
"篇"
)[
0
];
String
s1
=
s
.
split
(
"篇"
)[
0
];
s1
=
s1
.
replace
(
","
,
""
).
replace
(
"约"
,
""
);
s1
=
s1
.
replace
(
","
,
""
).
replace
(
"约"
,
""
);
...
...
src/main/java/com/zhiwei/media_data_crawler/crawler/BaiduTiebaCrawlerParse.java
View file @
b8eef493
...
@@ -138,6 +138,8 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
...
@@ -138,6 +138,8 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
if
(
title
==
null
||
title
.
length
()
<
1
)
{
if
(
title
==
null
||
title
.
length
()
<
1
)
{
title
=
document
.
select
(
"#j_core_title_wrap > h3"
).
text
();
title
=
document
.
select
(
"#j_core_title_wrap > h3"
).
text
();
}
}
String
source
=
null
;
source
=
document
.
select
(
"div.card_top.clearfix > div.card_title > a"
).
text
();
System
.
out
.
println
(
title
);
System
.
out
.
println
(
title
);
for
(
Element
element
:
elementes
)
{
for
(
Element
element
:
elementes
)
{
String
time
=
null
;
String
time
=
null
;
...
@@ -159,7 +161,7 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
...
@@ -159,7 +161,7 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
}
}
if
(
time
!=
null
&&
time
.
length
()
>
1
)
{
if
(
time
!=
null
&&
time
.
length
()
>
1
)
{
TiebaData
tbd
=
new
TiebaData
(
"http://tieba.baidu.com/p/"
+
aid
,
title
,
time
,
tid
,
null
,
author
,
content
,
aid
);
TiebaData
tbd
=
new
TiebaData
(
"http://tieba.baidu.com/p/"
+
aid
,
title
,
time
,
tid
,
source
,
author
,
content
,
aid
);
System
.
out
.
println
(
tbd
.
toString
());
System
.
out
.
println
(
tbd
.
toString
());
list
.
add
(
tbd
);
list
.
add
(
tbd
);
}
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment