Commit 6c9f649a by zhiwei

处理搜狗微信搜索链接中出现两次https的问题

parent 9159a942
......@@ -3,7 +3,7 @@
<modelVersion>4.0.0</modelVersion>
<groupId>com.zhiwei</groupId>
<artifactId>wechat</artifactId>
<version>1.3.0-SNAPSHOT</version>
<version>1.3.1-SNAPSHOT</version>
<description>
知微微信采集程序,包含
1.微信历史文章采集
......
......@@ -300,7 +300,11 @@ public class WechatAritcleSearch {
for (Element element : elements) {
try {
title = element.select("div.txt-box").select("h3").text();
link = "https://weixin.sogou.com" + element.select("div.txt-box").select("h3 >a").attr("href");
link = element.select("div.txt-box").select("h3 >a").attr("href");
if(!link.contains("https")){
link = "https://weixin.sogou.com" + link;
}
content = "";
if (element.select("p.txt-info").isEmpty()) {
content = element.select("p.txt-info").text();
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment