java使⽤Document类解析html
今天⼯作中⽤到了解析html获取其中的标签内容,在此记录⼀下:
先感谢两个帖⼦:
引⽤:
接下来是我的应⽤:
<DIV class="navbar navbar-inverse navbar-fixed-top">
<DIV class="navbar-inner">
<DIV class="container-fluid">
<a class="brand lnk-file-title" STYLE="text-decoration: none; width: 200px" TITLE=" "> </a>
<a id="btnPrint" STYLE="margin:0px;padding:10px;" href="javascript:;" onClick="printDoc()">
<img src="./1281e94387f5efb28be502f828edc032.files/print.png">
</a>
<DIV class="changePage">
<a class="pageUp" href="javascript:;" onClick="slidePage(0)"></a>
<a class="pageDown" href="javascript:;" onClick="slidePage(1)"></a>
<SPAN STYLE="padding:0px 10px 0px 10px">Page:</SPAN>
<INPUT class="activePage" type="text" Value="1" onBlur="changePage(this.value)" onkeyup="this.value=place(/[^0-9]/g,'')" onafterpaste= "this.value=place(/[^0-9]/g,'')">
<SPAN class="totalPage"></SPAN>
</DIV>
</DIV>
</DIV>
</DIV>
<DIV id="printArea" STYLE="display:none"></DIV>
<DIV class="container-fluid container-fluid-content">
<DIV class="row-fluid">
<DIV class="span12 docArea">
html document是什么<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/1.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/2.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/3.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/4.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/5.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px">
<DIV class="word-content"></DIV>
</DIV>
</DIV>
</DIV>
</DIV>
我是要拿取embed标签中的src的内容:
对上⾯两个帖⼦进⾏整合:
public static List<String> match(String source, String element, String attr) {
List<String> result = new ArrayList<String>();
String reg = "<" + element + "[^<>]*?\\s" + attr + "=['\"]?(.*?)['\"]?\\s.*?>";
Matcher m = Patternpile(reg).matcher(source);
while (m.find()) {
String r = m.group(1);
result.add(r);
}
return result;
}
public static void main(String[] args) throws MalformedURLException, IOException {
Document doc= Jsoup.parse(new URL("docv.hdkt100/2018/11/1281e94387f5efb28be502f828edc032.html"),100000); String html = String();
// String source = "<a title=中国体育报 href=''>aaa</a><a title='北京⽇报' href=''>bbb</a>";
List<String> list = match(html, "embed", "src");
System.out.println(list);
}
其中⽤到的jar包:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.2</version>
</dependency>
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论