htmlunit官⽹简易教程(翻译)
1 环境搭建:
  1)下载
  从链接:sourceforge/projects/htmlunit/files/htmlunit/
  下载最新的bin⽂件
  2)关于bin⽂件
  ⾥⾯主要包含两部分,⼀是lib⽬录下的.jar⽂件,还有就是apidocs⽬录下的帮助⽂件(即API说明⽂件,打开index-all.html,是以⽹页形式提供)
  3)配置java的CLASSPATH(纯⼿⼯⽅法)
  将lib⽬录下的所有.jar⽂件复制到任意⽬录(如:c:\htmlunit\lib\)
  然后右击我的电脑->属性->⾼级->环境变量->系统变量中,对CLASSPATH进⾏编辑,如果没有就新建⼀个(如果运⾏java或编译时有错误,就在)
  务必将所有.jar⽂件的详细地址添加到CLASSPATH中,⽽不是⽤“c:\htmlunit\lib\”来代替,如.;c:\htmlunit\lib\1.jar;c:\htmlunit\lib\2.jar; 才是正确的写法
  务必每⼀个都写清楚,需要注意最前⾯有个点".",最后⾯有个";"
2 解释和说明:
  1).jar其实就是编译好的.class⽂件集,可以使⽤rar解压软件打开。所以.jar本质是⼀个⽬录
  2)官⽹的教程有些地⽅写的很奇怪和不直观,所以我做了些调整,主要是使输出结果更加直观
  3)⾥⾯每⼀个函数的具体使⽤⽅法在APIDOCS中已经有详细的说明了,我这⾥就不重复了
3 开始翻译教程
3.1 获取页⾯的TITLE、XML代码、⽂本
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlDivision;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.WebClientOptions;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlBody;
import java.util.List;
public class helloHtmlUnit{
public static void main(String[] args) throws Exception{
String str;
//创建⼀个webclient
WebClient webClient = new WebClient();
//htmlunit 对css和javascript的⽀持不好,所以请关闭之
//获取页⾯
HtmlPage page = Page("www.baidu/");
//获取页⾯的TITLE
str = TitleText();
System.out.println(str);
//获取页⾯的XML代码
str = page.asXml();
System.out.println(str);
//获取页⾯的⽂本
str = page.asText();
System.out.println(str);
//关闭webclient
webClient.closeAllWindows();
}
}
3.2 使⽤不同版本的浏览器打开
import com.gargoylesoftware.htmlunit.WebClient;html代码翻译中文
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlDivision;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.WebClientOptions;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlBody;
import java.util.List;
public class helloHtmlUnit{
public static void main(String[] args) throws Exception{
String str;
/
/使⽤FireFox读取⽹页
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
//htmlunit 对css和javascript的⽀持不好,所以请关闭之
HtmlPage page = Page("www.baidu/");
str = TitleText();
System.out.println(str);
//关闭webclient
webClient.closeAllWindows();
}
}
3.3 到页⾯中特定的元素
public class helloHtmlUnit{
public static void main(String[] args) throws Exception{
//创建webclient
WebClient webClient = new WebClient(BrowserVersion.CHROME);
//htmlunit 对css和javascript的⽀持不好,所以请关闭之
HtmlPage page = (Page("www.baidu/");
//通过id获得"百度⼀下"按钮
HtmlInput btn = (HtmlElementById("su");
System.out.DefaultValue());
//关闭webclient
webClient.closeAllWindows();
}
}
3.4 元素检索
public class helloHtmlUnit{
public static void main(String[] args) throws Exception{
//创建webclient
WebClient webClient = new WebClient(BrowserVersion.CHROME);
/
/htmlunit 对css和javascript的⽀持不好,所以请关闭之
HtmlPage page = (Page("www.baidu/");
//查所有div
List<?> hbList = ByXPath("//div");
HtmlDivision hb = ((0);
System.out.String());
//查并获取特定input
List<?> inputList = ByXPath("//input[@id='su']");
HtmlInput input = ((0);
System.out.String());
//关闭webclient
webClient.closeAllWindows();
}
}
3.5 提交搜索
public class helloHtmlUnit{
public static void main(String[] args) throws Exception{
//创建webclient
WebClient webClient = new WebClient(BrowserVersion.CHROME);
//htmlunit 对css和javascript的⽀持不好,所以请关闭之
HtmlPage page = (Page("www.baidu/");
//获取搜索输⼊框并提交搜索内容
HtmlInput input = (HtmlElementById("kw");
System.out.String());
input.setValueAttribute("雅蠛蝶");
System.out.String());
//获取搜索按钮并点击
HtmlInput btn = (HtmlElementById("su");
HtmlPage page2 = btn.click();
/
/输出新页⾯的⽂本
System.out.println(page2.asText());
}
}

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。