Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
media_data_crawler
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
media_data_crawler
Commits
fc1372d5
Commit
fc1372d5
authored
Mar 01, 2018
by
zhiwei
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
添加README.md 说明文件
parent
ba2389f8
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
34 additions
and
0 deletions
+34
-0
README.md
+34
-0
No files found.
README.md
0 → 100644
View file @
fc1372d5
知微搜索引擎采集jar
==================
##### 摘要
> 这是一个基于OKHttp+Jsoup实现的网页抓取及解析功能的搜索引擎采集爬虫,目前包含:百度新闻、搜狗新闻、360新闻三种根据关键词采集功能
的爬虫项目
##### maven
<dependency>
<groupId>
com.zhiwei
</groupId>
<artifactId>
media_data_crawler
</artifactId>
<version>
0.0.1-SNAPSHOT
</version>
</dependency>
##### 调用demo
String word = "马云"; //关键词
String startTime = "2017-03-01 00:00:00"; //开始时间
String endTime = "2017-03-01 23:59:59"; //结束时间
Proxy proxy = null; //代理IP,不用可不填写
//百度新闻采集demo
List
<NewsData>
baiduNewsList = DataCrawler.getBaiduNewsData(word, startTime, endTime, proxy);
//搜狗新闻关键词采集demo
List
<NewsData>
sogouNewsList = DataCrawler.getSougouNewsData(word, proxy);
//360新闻采集demo
List
<NewsData>
soNewsList = DataCrawler.getSoNewsData(word, proxy);
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment