7.3 Exploring the HTTP Protocol

In this section we will manually exercise the HTTP protocol by pretending to be a web browser and sending HTTP commands to a web server to retrieve data. To play with the HTTP protocol, we will use one of the earliest Internet applications ever built.

The “telnet” application was first developed in 1968, and was developed according to one of the earliest standards for the Internet:

https://tools.ietf.org/html/rfc15

This standard is only five pages long and at this point, you probably can easily read and understand most of what is in the document. The telnet client application is so old that it is effectively a dinosaur, as it comes from “prehistoric” times in terms of the age of the Internet. The Internet was created in 1985 by the NSFNet project and the precursor to the NSFNet called the ARPANET was brought up in 1969. Telnet was designed and built even before the first TCP/IP network was in production.

Interestingly, the telnet application is still present in most modern operating systems. You can access telnet from the terminal (command line) in Macintosh and Linux. The telnet application was also present in Windows 95 through Windows XP, but is not included in later versions of Windows. If you have a later version of Windows, you can download and install a telnet client to do the examples in this section.

Telnet is a simple application. Run telnet from the command line (or terminal) and type the following command: telnet www.dr-chuck.com 80

The first parameter is a domain name (an IP address would work here as well) and a port to connect to on that host. We use the port to indicate which application server we would like to connect to. Port 80 is where we typically expect to find an HTTP (web) server application on a host. If there is no web server listening on port 80, our connection will time out and fail. But if there is a web server, we will be connected to that web server and whatever we type on our keyboard will be sent directly to the server.

At this point, we need to know the HTTP protocol and type the commands precisely as expected. If we don’t know the protocol, the web server will not be too friendly. Here is an example of things not going well:

 telnet www.dr-chuck.com 80
 Trying 198.251.66.43…
 Connected to www.dr-chuck.com.
 Escape character is '^]'.
 HELP
 501 Method Not Implemented … 
 Connection closed by foreign host.

We type “telnet” in the terminal requesting a connection to port 80 (the web server) on the host www.dr-chuck.com. You can see as our transport layer is looking up the domain name, finding that the actual address is “198.251.66.43”, and then making a successful connection to that server. Once we are connected, the server waits for us to type a command followed by the enter or return key. Since we don’t know the protocol, we type “HELP” and enter. The server is not pleased, gives us an error message, and then closes the connection. We do not get a second chance. If we do not know the protocol, the web server does not want to talk to us.

But let’s go back and read section 5.3.2 of the RFC-7230 document and try again to request a document using the correct syntax:

telnet www.dr-chuck.com 80
Trying 198.251.66.43...
Connected to www.dr-chuck.com.
Escape character is '^]'.
GET http://www.dr-chuck.com/page1.htm HTTP/1.0
HTTP/1.1 200 OK
Last-Modified: Sun, 19 Jan 2014 14:25:43 GMT
Content-Length: 131
Content-Type: text/html
<h1>The First Page</h1>
<p>
If you like, you can switch to the
<a href="http://www.dr-chuck.com/page2.htm">
Second Page</a>.
</p>
Connection closed by foreign host.

We make the connection to the web browser again using telnet, then we send a GET command that indicates which document we want to retrieve. We use version 1.0 of the HTTP protocol because

Hacking HTTP By Hand
Figure 7.4: Hacking HTTP By Hand

it is simpler than HTTP 1.1. Then we send a blank line by pressing “return” or “enter” to indicate that we are done with our request.

Since we have sent the proper request, the host responds with a series of headers describing the document, followed by a blank line, then it sends the actual document.

The headers communicate metadata (data about data) about the document that we have asked to retrieve. For example, the first line contains a status code.

In this example, the status code of “200” means that things went well. A status code of “404” in the first line of the headers indicates that the requested document was not found. A status code of “301” indicates that the document has moved to a new location.

The status codes for HTTP are grouped into ranges: 2XX codes indicate success, 3XX codes are for redirecting, 4XX codes indicate that the client application did something wrong, and 5xx codes indicate that the server did something wrong.

This is a HyperText Markup Language (HTML) document, so it is marked up with tags like <h1> and <p>. When your browser receives the document in HTML format, it looks at the markup in the document, interprets it, and presents you a formatted version of the document.

Back to Book’s Main Index