Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
S
source_forward
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
zhiwei
source_forward
Commits
7c541080
Commit
7c541080
authored
Oct 28, 2019
by
yangchen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
全文获取修改为晋豪获取
parent
a715f4b8
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
17 additions
and
10 deletions
+17
-10
src/main/java/com/zhiwei/source_forward/util/MatchContent.java
+17
-10
No files found.
src/main/java/com/zhiwei/source_forward/util/MatchContent.java
View file @
7c541080
...
@@ -8,8 +8,7 @@ import org.jsoup.nodes.Document;
...
@@ -8,8 +8,7 @@ import org.jsoup.nodes.Document;
import
org.slf4j.Logger
;
import
org.slf4j.Logger
;
import
org.slf4j.LoggerFactory
;
import
org.slf4j.LoggerFactory
;
import
com.zhiwei.source_forward.content.ContentExtractor
;
import
com.kohlschutter.boilerpipe.extractors.ArticleExtractor
;
import
com.zhiwei.source_forward.content.News
;
import
com.zhiwei.tools.tools.ZhiWeiTools
;
import
com.zhiwei.tools.tools.ZhiWeiTools
;
/**
/**
...
@@ -101,14 +100,22 @@ public class MatchContent {
...
@@ -101,14 +100,22 @@ public class MatchContent {
*/
*/
private
static
String
mathchContent
(
String
html
,
Document
document
){
private
static
String
mathchContent
(
String
html
,
Document
document
){
/** 正文抽取,目的是避免正文中存在相应匹配规则 **/
/** 正文抽取,目的是避免正文中存在相应匹配规则 **/
String
content
=
null
;
String
content
=
null
;
try
{
try
{
News
news
=
ContentExtractor
.
getNewsByHtml
(
html
);
content
=
ArticleExtractor
.
getInstance
().
getText
(
html
);
content
=
TreateData
.
filterSpecialCharacter
(
news
.
getContent
());
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
logger
.
error
(
"正文抽取失败,获取全文文本:"
,
e
);
logger
.
error
(
"正文抽取失败,获取全文文本:"
,
e
);
content
=
document
.
text
();
content
=
document
.
text
();
}
}
// String content = null;
// try {
// News news = ContentExtractor.getNewsByHtml(html);
// content = TreateData.filterSpecialCharacter(news.getContent());
// } catch (Exception e) {
// logger.error("正文抽取失败,获取全文文本:",e);
// content = document.text();
// }
return
content
;
return
content
;
}
}
}
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment