Programming Using URLs

URLs have an important role within the Internet. Even novice users of the web know about URLs. In Java, a URL can be used by a program to retrieve information, without the need for any more detailed understanding of the Internet.


Example - A Primitive Web Browser

It is easy to write a Web browser in Java to read and display information from a remote host computer, given a URL. The following application does this. Type in a URL into a text field, including the http:// part. The contents of the web page are displayed in the text area.

This is the second way of creating a "web browser". The first was using a telnet program. Another method is yet to come.

// MiniBrowser.java
// Retrieves a file from a web server as raw text.

import java.awt.*;
import java.awt.event.*;
import java.net.*;
import java.io.*;

public class MiniBrowser
    extends Frame
    implements ActionListener
{
    private TextField txtInput;
    private TextArea contents;
    private Button btnDisplay;
    private Button exit;

    public static void main(String [] args)
    {
        MiniBrowser m = new MiniBrowser();
        m.makeGUI();
        m.setSize(550, 400);
        m.setVisible(true);
    }

    public void makeGUI()
    {
        setLayout(new FlowLayout());

        txtInput = new TextField(50);
        add(txtInput);
        btnDisplay = new Button("Display page at this URL");
        add(btnDisplay);

        exit = new Button("Exit");
        add(exit);
        exit.addActionListener(this);

        contents =
            new TextArea("", 20, 60, TextArea.SCROLLBARS_VERTICAL_ONLY);
        add(contents);
        btnDisplay.addActionListener(this);
    }

    public void actionPerformed(ActionEvent event)
    {
        if (event.getSource() == exit)
            System.exit(0);

        String line;
        String location = txtInput.getText();

        try
        {
            URL url = new URL(location);
            BufferedReader input = new
                BufferedReader(new InputStreamReader(url.openStream()));
            while ((line = input.readLine()) != null)
            {
                 contents.append(line);
                 contents.append("/n");
            }
            input.close();
        }
        catch (MalformedURLException e)
        {
            contents.setText("Invalid URL Format");
        }
        catch (IOException io)
        {
            contents.setText(io.toString());
        }
    }
}

The program first creates a URL object by calling a constructor method of the class URL, supplying the string version of the URL as the parameter. If there is something wrong with the syntax of the URL, an exception is raised and the program displays an error message.


Streams

Accessing information across the Internet is accomplished using library classes for streams. These are the same classes that are used to read and write information from files. Thus reading or writing to a network or the Internet is "just" like reading or writing to a serial file. Not that that is particularly straightfoward or trivial! There are about 60 Java library classes providing stream access. The trick is to select the appropriate class. Some of them are of no use for Internet access; for example, those that do random access or provide line numbers on input streams. Some of them deal with data as binary data and others treat a stream as character data.

Within the program, an input stream is created:

InputStream is = url.openStream();

This is a connection to the URL. We could now do input directly using this object, but it is more convenient to create some other stream objects:

InputStreamReader isr = new InputStreamReader(is);
BufferedReader input = new BufferedReader(isr);

This is a class that supports character input and output, rather than binary, because the assumption is that the data at the desired URL is going to be characters, which will be the case if it's HTML. Moreover, we can process a whole line at a time with this class.

A while loop inputs a line at a time using the method readLine() from the class BufferedReader. Each line is appended to the text area. This continues until there are no more lines, i.e., until a "null line" is encountered. Since readLine strips end-of-line characters, they have to be put back.

So, streams are are a zoo. There are so many of them that you have to be careful to use the most appropriate ones, and they can

To use an input stream, remember that, as always, you have to

  1. Open it (i.e., create the stream object)
  2. Read data from it, usually in a loop
  3. Test for the end-of-stream
  4. Close it when you're finished


Remarks

Note that this program uses a high level Java API. It uses only a URL, typed in by the user, to retrieve a web page from a remote host. (For example, there is no mention of TCP, IP, IP addresses or sockets, which are all used by this program.) This demonstrates how Java has been designed to carry out internet tasks easily. The Java library classes used by this program know about and use the HTTP protocol. Thus, for example, the HTTP header has been stripped from the information sent to the client from the web server. The classes know about the protocol, but not about the content, which is displayed as raw HTML. This web browser assumes that the data retrieved from the site is text. If the data expected was a GIF file or a Java class files, some other class would have to be used to input the data. Also this program does, of course, assume that there is a Web server program running on the server to retrieve and return the information to this client, according to the HTTP protocol.


So, Java provides a class URL, with constructor methods to create an object, and access methods to retrieve portions of the URL from such an object.

Class name:

URL

Import:

import java.net.*;


Summary of class URL Info

Method

Description

Example

Constructors:

 

 

public URL(String url)

Creates a new URL object corresponding to the string supplied.

URL url = new URL("http://www.shu.ac.uk");

public URL(String protocol, String host, int port, String file)

Creates a new URL ...

URL url = new URL("http", "www.shu.ac.uk", 80, "/default.htm");

Object Methods:

 

 

public String getProtocol()

Returns the protocol as a string.

String protocol = url.getProtocol();

public String getHost()

Returns the host name part of the URL as a string.

String name = url.getHost();

public int getPort()

Returns the port as a string.

int port = url.getPort();

public String getFile()

Returns the path and file name part of a URL as a string.

String name = url.getFile();

public final InputStream openStream()

Makes a connection to the resource specified by the URL.

Returns an InputStream object from which data can be read.

InputStream is = url.openStream();

public String toString()

Returns the URL as a string.

String URLName = url.toString();


Exercises: In these exercises your goals should be:

  1. To understand how easily a web browser can be constructed in Java
  2. To assess the limitations of this browser
  3. To understand the Java library class URL
  • Run this primitive web browser program and explore the various exceptions that can arise.
  • Identify the limitations of this browser and suggest enhancements that could be made.
  • Add a default (home) page so that a particular URL is displayed when the program is run. Add a new (home) button to revert to this home page.
  • Enhance the web browser so that it clears the text area before displaying a new web page.
  • Modify the web browser so that it takes some account of the content of the page that is downloaded. For example, you could try making it is display only those lines that do not contain either of the characters > or <. To accomplish this use the String method indexOf(). It returns the integer -1 if the string given as a parameter is not in the string calling the method. So, for example, you could use code something like

    if (myString.indexOf(">") == -1)
        // is not in
    else
        // is in
    

    to process the file.