# SimulationFang **Repository Path**: dgwcode/SimulationFang ## Basic Information - **Project Name**: SimulationFang - **Description**: SimulationFang Java爬虫 Htmlunit - **Primary Language**: Java - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 1 - **Created**: 2018-11-18 - **Last Updated**: 2022-07-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SimulationFang #### 项目介绍 SimulationFang Java爬虫 Htmlunit #### 软件架构 HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser. It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating Chrome, Firefox or Internet Explorer depending on the configuration used. It is typically used for testing purposes or to retrieve information from web sites. HtmlUnit is not a generic unit testing framework. It is specifically a way to simulate a browser for testing purposes and is intended to be used within another testing framework such as JUnit or TestNG. Refer to the document "Getting Started with HtmlUnit" for an introduction. HtmlUnit is used as the underlying "browser" by different Open Source tools like Canoo WebTest, JWebUnit, WebDriver, JSFUnit, WETATOR, Celerity, Spring MVC Test HtmlUnit, ... HtmlUnit was originally written by Mike Bowler of Gargoyle Software and is released under the Apache 2 license. Since then, it has received many contributions from other developers, and would not be where it is today without their assistance. #### 安装教程 Introduction The dependencies page lists all the jars that you will need to have in your classpath. The class com.gargoylesoftware.htmlunit.WebClient is the main starting point. This simulates a web browser and will be used to execute all of the tests. Most unit testing will be done within a framework like JUnit so all the examples here will assume that we are using that. In the first sample, we create the web client and have it load the homepage from the HtmlUnit website. We then verify that this page has the correct title. Note that getPage() can return different types of pages based on the content type of the returned data. In this case we are expecting a content type of text/html so we cast the result to an com.gargoylesoftware.htmlunit.html.HtmlPage. ``` @Test public void homePage() throws Exception { try (final WebClient webClient = new WebClient()) { final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net"); Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText()); final String pageAsXml = page.asXml(); Assert.assertTrue(pageAsXml.contains("
")); final String pageAsText = page.asText(); Assert.assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols")); } } ``` Submitting a form Frequently we want to change values in a form and submit the form back to the server. The following example shows how you might do this. ``` @Test public void submittingForm() throws Exception { try (final WebClient webClient = new WebClient()) { // Get the first page final HtmlPage page1 = webClient.getPage("http://some_url"); // Get the form that we are dealing with and within that form, // find the submit button and the field that we want to change. final HtmlForm form = page1.getFormByName("myform"); final HtmlSubmitInput button = form.getInputByName("submitbutton"); final HtmlTextInput textField = form.getInputByName("userid"); // Change the value of the text field textField.type("root"); // Now submit the form by clicking the button and get back the second page. final HtmlPage page2 = button.click(); } } ``` Imitating a specific browser Often you will want to simulate a specific browser. This is done by passing a com.gargoylesoftware.htmlunit.BrowserVersion into the WebClient constructor. Constants have been provided for some common browsers but you can create your own specific version by instantiating a BrowserVersion. ``` @Test public void homePage_Firefox() throws Exception { try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52)) { final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net"); Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText()); } } ``` Specifying this BrowserVersion will change the user agent header that is sent up to the server and will change the behavior of some of the JavaScript. Finding a specific element Once you have a reference to an HtmlPage, you can search for a specific HtmlElement by one of 'get' methods, or by using XPath or CSS selectors. Traversing the DOM tree Below is an example of finding a 'div' by an ID, and getting an anchor by name: ``` @Test public void getElements() throws Exception { try (final WebClient webClient = new WebClient()) { final HtmlPage page = webClient.getPage("http://some_url"); final HtmlDivision div = page.getHtmlElementById("some_div_id"); final HtmlAnchor anchor = page.getAnchorByName("anchor_name"); } } ``` A simple way for finding elements might be to find all elements of a specific type. ``` @Test public void getElements() throws Exception { try (final WebClient webClient = new WebClient()) { final HtmlPage page = webClient.getPage("http://some_url"); NodeList inputs = page.getElementsByTagName("input"); final Iterator