TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to JAVA

Java Download Web Pages: URL and openStream

Use the URL, URI and InputStream classes to download a web page. Read in an entire remove HTML file.
Download URL, openStream. A remote HTML file contains important information. With Java's URI and URL classes we can download it and use its contents in a String.
With openStream, we obtain a stream of the file contents. With a buffer array, we can create a string from the data we download. A StringBuilder here is helpful.
First program. This example implements a getPage method. It takes a file from a remote address and places it into a new String. There are some complexities in getPage.

URI: We first create a URI object from the address argument (a String). This is used to create a new URL object.

InputStream: We invoke openStream on our URL instance to get a readable stream of the file contents.

Read: We use a while-loop to read the InputStream into a byte array. We then append to a StringBuilder to get the total file.

Result: We can see that on the "Example" domain, it fetched the correct HTML document. The document is more than 1024 bytes.

Java program that downloads web page, uses StringBuilder import java.io.IOException; import java.io.InputStream; import java.net.URISyntaxException; import java.net.URL; import java.net.URI; public class Program { public static String getPage(String address) throws IOException, URISyntaxException { // Get URI and URL objects. URI uri = new URI(address); URL url = uri.toURL(); // Get stream of the response. InputStream in = url.openStream(); // Store results in StringBuilder. StringBuilder builder = new StringBuilder(); byte[] data = new byte[1024]; // Read in the response into the buffer. // ... Read many bytes each iteration. int c; while ((c = in.read(data, 0, 1024)) != -1) { builder.append(new String(data, 0, c)); } // Return String. return builder.toString(); } public static void main(String[] args) { try { String page = getPage("http://www.example.com/"); System.out.println(page); } catch (Exception ex) { System.out.println("ERROR"); } } } Output <!doctype html> <html> <head> <title>Example Domain</title> <meta charset="utf-8" />
Short example. I developed this program when learning to use URI and URL objects. It creates a BufferedInputStream from the InputStream.

However: It is unclear whether this approach has any advantage over using the InputStream directly.

Also: When you have a byte array, we can convert it into a String with the String constructor.

So: With this method, we can quickly download the first bytes of a document. This is helpful if we only need a small piece of a document.

Java program that uses URI, URL and InputStream import java.io.BufferedInputStream; import java.io.InputStream; import java.net.URL; import java.net.URI; public class Program { public static void main(String[] args) throws Exception { // Create URI and URL objects. URI uri = new URI("http://en.wikipedia.org/wiki/Main_Page"); URL url = uri.toURL(); InputStream in = url.openStream(); // Used a BufferedInputStream. BufferedInputStream reader = new BufferedInputStream(in); // Read in the first 200 bytes from the website. byte[] data = new byte[200]; reader.read(data, 0, 200); // Convert the bytes to a String. String result = new String(data); System.out.println(result); } } Output <!DOCTYPE html> <html lang="en" dir="ltr" class="client-nojs"> <head> <meta charset="UTF-8" /> <title>Wikipedia, the free encyclopedia</title> ...
To download web pages, we combine many classes. We use URI and URL objects to start, and an InputStream to get the data. A byte array is a suitable buffer.

And: A StringBuilder may also be used. In the getPage method above, we fetch an entire web page as a String.

Some notes. If only the first bytes of a web page are needed, it is probably best to avoid looping to get the entire file. This may also prevent errors with unusually long web pages.
© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf