Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
media_data_crawler
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
media_data_crawler
Commits
362773e1
Commit
362773e1
authored
Aug 11, 2018
by
zhiwei
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
删除一些log
parent
b8eef493
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
0 additions
and
4 deletions
+0
-4
src/main/java/com/zhiwei/media_data_crawler/crawler/BaiduTiebaCrawlerParse.java
+0
-4
No files found.
src/main/java/com/zhiwei/media_data_crawler/crawler/BaiduTiebaCrawlerParse.java
View file @
362773e1
...
@@ -103,7 +103,6 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
...
@@ -103,7 +103,6 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
}
}
String
ur
=
url
+
"?pn="
+
page
;
String
ur
=
url
+
"?pn="
+
page
;
String
htmlBody
=
downloadHtml
(
ur
,
proxy
);
String
htmlBody
=
downloadHtml
(
ur
,
proxy
);
System
.
out
.
println
(
url
+
"------------"
+
aid
);
if
(
htmlBody
!=
null
)
{
if
(
htmlBody
!=
null
)
{
Map
<
String
,
Object
>
dataMap
=
analysisDataAnswer
(
htmlBody
,
aid
);
Map
<
String
,
Object
>
dataMap
=
analysisDataAnswer
(
htmlBody
,
aid
);
List
<
TiebaData
>
dataList
=
(
List
<
TiebaData
>)
dataMap
.
get
(
"data"
);
List
<
TiebaData
>
dataList
=
(
List
<
TiebaData
>)
dataMap
.
get
(
"data"
);
...
@@ -140,7 +139,6 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
...
@@ -140,7 +139,6 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
}
}
String
source
=
null
;
String
source
=
null
;
source
=
document
.
select
(
"div.card_top.clearfix > div.card_title > a"
).
text
();
source
=
document
.
select
(
"div.card_top.clearfix > div.card_title > a"
).
text
();
System
.
out
.
println
(
title
);
for
(
Element
element
:
elementes
)
{
for
(
Element
element
:
elementes
)
{
String
time
=
null
;
String
time
=
null
;
String
content
=
null
;
String
content
=
null
;
...
@@ -162,7 +160,6 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
...
@@ -162,7 +160,6 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
if
(
time
!=
null
&&
time
.
length
()
>
1
)
{
if
(
time
!=
null
&&
time
.
length
()
>
1
)
{
TiebaData
tbd
=
new
TiebaData
(
"http://tieba.baidu.com/p/"
+
aid
,
title
,
time
,
tid
,
source
,
author
,
content
,
aid
);
TiebaData
tbd
=
new
TiebaData
(
"http://tieba.baidu.com/p/"
+
aid
,
title
,
time
,
tid
,
source
,
author
,
content
,
aid
);
System
.
out
.
println
(
tbd
.
toString
());
list
.
add
(
tbd
);
list
.
add
(
tbd
);
}
}
}
}
...
@@ -316,7 +313,6 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
...
@@ -316,7 +313,6 @@ public class BaiduTiebaCrawlerParse extends HttpClientTemplateOK {
author
=
element
.
select
(
"a"
).
select
(
"font.p_violet"
).
text
().
split
(
" "
)[
1
];
author
=
element
.
select
(
"a"
).
select
(
"font.p_violet"
).
text
().
split
(
" "
)[
1
];
time
=
element
.
select
(
"font.p_date"
).
text
();
time
=
element
.
select
(
"font.p_date"
).
text
();
TiebaData
tiebaData
=
new
TiebaData
(
link
,
title
,
time
,
tid
,
source
,
author
,
content
,
word
);
TiebaData
tiebaData
=
new
TiebaData
(
link
,
title
,
time
,
tid
,
source
,
author
,
content
,
word
);
System
.
out
.
println
(
tiebaData
);
list
.
add
(
tiebaData
);
list
.
add
(
tiebaData
);
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
logger
.
debug
(
"无作者 或者 无来源"
);
logger
.
debug
(
"无作者 或者 无来源"
);
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment