The HTTP Protocol
This is the protocol used to retrieve Web pages from a server (normally on port 80) and also to send information obtained from a form back to a server. It is perhaps the most complex of protocols mentioned in these notes. Like other protocols, this one can be explored using Telnet to act as a primitive web browser, sending and receiving information according to the protocol.
The HTTP protocol works like this:
GET /index.html HTTP/1.0which gives only the path name, since the machine name is already implicit.
HTTP/1.0 200 OK
A separate connection is used for each request.
The response code 200 OK is the most common response, signaling that the request was successful. There are many response codes, grouped as shown below:
Response Code Grouping |
General Meaning |
200 - 299 |
success |
300 - 399 |
web browser needs to go to another page |
400 - 499 |
client error |
500 - 599 |
server error |
Some common particular response codes are:
Response Code |
Meaning |
200 OK |
The request was successful. |
301 Moved Permanently |
The page has moved to a new URL. |
304 Not Modified |
The client made a request for a page, but used an option to specify that it only requires the page if it has been changed. |
400 Bad request |
The request has faulty syntax |
401 Unauthorized |
Authorization is needed to access this page, Either the authorization is wring or has not been supplied. |
404 Not Found |
The server cannot find the page. This is a common error. |
503 Service Unavailable |
The server is temporarily unable to handle the request, perhaps due to maintenance or overloading |
A typical request might look like this:
GET /index.html HTTP/1.0 Accept: text/html Accept: image/gif User-Agent: Lynx/2.4
This is a sequence of lines, in ASCII, terminated by an empty line. As we have seen, the second item on the first line is the path name. This is followed by the version of the HTTP protocol that the client understands. This line is all that is required. However, other information can be provided by the client. Each piece of information is on a separate line and takes the form:
keyword: value
For example, the line
Accept: text/html
says that the client can accept html documents, while the line
Accept: image/gif
says that the client can accept images in the Graphics Interchange Format (one of the very common image file formats used on the web). This kind of information allows the server to tailor its responses to what the client is able to process. The client can also say which web browser and version it is, as in
User-Agent: Lynx/2.4
There are also other request types in addition to GET. For example, HEAD retrieves only the file header, so that the browser can see whether it has been updated since it last retrieved a copy, and POST is used in conjunction with forms and CGI (the Common Gateway Interface protocol).
A typical response consists of a number of header lines, followed by an empty line, followed by the contents of the file - usually in the form of HTML. For example, we might get this:
HTTP/1.1 200 OK Date: Mon, 12 Jul 1999 12:42:22 GMT Server: Apache/1.3.6 (Unix) Last-Modified: Wed, 07 Jul 1999 17:14:42 GMT ETag: "fcdd-17e-37838b02" Accept-Ranges: bytes Content-Length: 382 Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> etc.
The first line gives the HTTP version number and a response code (see above). The third line is the name of the server program and version number. The last line of the header specifies the type of content being returned.
Exercise:
Begin by trying to replicate the following,
and then trying the same thing with another
server or two. Note that you only have to
enter the telnet command, the line beginning
with GET
, and the blank line
immediately following the GET
line.
Use upper case as shown. GET is only one of several
HTTP commands. You might choose to type some file
name other than the one shown (the index file).
Study the reply carefully. The meanings of any
error messages might be available from the
tables given above.
If the web page you are retrieving is a long one, it may be difficult to display if your Telnet program does not allow its window to be scrolled. In that case you might want to switch on logging before connecting. Afterwards you can display the dialogue using any editor. You should see several lines of heading, followed by the "raw" HTML of the Web page.
$ telnet cs.smu.ca 80 Trying 140.184.76.9... Connected to cs.stmarys.ca. Escape character is '^]'. GET /index.html HTTP/1.0 HTTP/1.1 200 OK Date: Fri, 21 Mar 2003 15:34:52 GMT Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 Connection: close Content-Type: text/html; charset=iso-8859-1 <html> <head> <meta http-equiv="Refresh" content="1;URL=http://www.stmarys.ca/academic/science/compsci"> </head> <body> If you are not automatically forwarded click <a href="http://www.stmarys.ca/academic/science/compsci">here.</a> </body> </html> Connection closed by foreign host.