How HTTP Clients Work

Once the server is set up, we can get down to business. The client has the easy end: it wants web action on a particular site, and it sends a request with a URL that begins with http to indicate what service it wants (other common services are ftp for File Transfer Protocolor https for HTTP with Secure Sockets Layer — SSL) and continues with these possible parts:

 //<user>:<password>@<host>:<port>/<url-path>

RFC 1738 says:

Some or all of the parts “<user>:<password>@”, “:<password>”,":<port>”, and “/<url-path>” may be omitted. The scheme specific data start with a double slash “//” to indicate that it complies with the common Internet scheme syntax.

In real life, URLs look more like: http://www.apache.org/ — that is, there is no user and password pair, and there is no port. What happens?

The browser observes that the URL starts with http: and deduces that it should be using the HTTP protocol. The client then contacts a name server, which uses DNS to resolve www.apache.org to an IP address. At the time of writing, this was 63.251.56.142. One way to check the validity of a hostname is to go to the operating-system prompt[8] and type:

             ping www.apache.org

If that host is connected to the Internet, a response is returned:

Pinging www.apache.org [63.251.56.142] with 32 bytes of data:

Reply from 63.251.56.142: bytes=32 time=278ms TTL=49
Reply from 63.251.56.142: bytes=32 time=620ms TTL=49
Reply from 63.251.56.142: bytes=32 time=285ms TTL=49
Reply from 63.251.56.142: bytes=32 time=290ms TTL=49

Ping statistics for 63.251.56.142:

A URL can be given more precision by attaching a port number: the web address http://www.apache.org doesn’t include a port because it is port 80, the default, and the browser takes it for granted. If some other port is wanted, it is included in the URL after a colon — for example, http://www.apache.org:8000/. We will have more to do with ports later.

The URL always includes a path, even if is only /. If the path is left out by the careless user, most browsers put it back in. If the path were /some/where/foo.html on port 8000, the URL would be http://www.apache.org:8000/some/where/foo.html.

The client now makes a TCP connection to port number 8000 on IP 204.152.144.38 and sends the following message down the connection (if it is using HTTP 1.0):

GET /some/where/foo.html HTTP/1.0<CR><LF><CR><LF>

These carriage returns and line feeds (CRLF) are very important because they separate the HTTP header from its body. If the request were a POST, there would be data following. The server sends the response back and closes the connection. To see it in action, connect again to the Internet, get a command-line prompt, and type the following:

% telnet www.apache.org 80

> telnet www.apache.org 80
            
GET http://www.apache.org/foundation/contact.html HTTP/1.1
Host: www.apache.org

On Win98, telnet puts up a dialog box. Click connect remote system, and change Port from “telnet” to “80”. In Terminal preferences, check “local echo”. Then type this, followed by two Returns:

GET http://www.apache.org/foundation/contact.html HTTP/1.1
Host: www.apache.org

You should see text similar to that which follows.

Some implementations of telnet rather unnervingly don’t echo what you type to the screen, so it seems that nothing is happening. Nevertheless, a whole mess of response streams past:

Trying 64.125.133.20...
Connected to www.apache.org.
Escape character is '^]'.
HTTP/1.1 200 OK
Date: Mon, 25 Feb 2002 15:03:19 GMT
Server: Apache/2.0.32 (Unix)
Cache-Control: max-age=86400
Expires: Tue, 26 Feb 2002 15:03:19 GMT
Accept-Ranges: bytes
Content-Length: 4946
Content-Type: text/html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
               "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
      <title>Contact Information--The Apache Software Foundation</title>
 </head>
 <body bgcolor="#ffffff" text="#000000" link="#525D76">        
  <table border="0" width="100%" cellspacing="0">
   <tr><!-- SITE BANNER AND PROJECT IMAGE -->
    <td align="left" valign="top">
<a href="http://www.apache.org/"><img src="../images/asf_logo_wide.gif" alt="The 
Apache Software Foundation" align="left" border="0"/></a>
</td>
   </tr>
  </table>
  <table border="0" width="100%" cellspacing="4">
   <tr><td colspan="2"><hr noshade="noshade" size="1"/></td></tr>
   <tr>
    <!-- LEFT SIDE NAVIGATION -->
    <td valign="top" nowrap="nowrap">
           <p><b><a href="/foundation/projects.html">Apache Projects</a></b></p>
    <menu compact="compact">
          <li><a href="http://httpd.apache.org/">HTTP Server</a></li>
          <li><a href="http://apr.apache.org/">APR</a></li>
          <li><a href="http://jakarta.apache.org/">Jakarta</a></li>
          <li><a href="http://perl.apache.org/">Perl</a></li>
          <li><a href="http://php.apache.org/">PHP</a></li>
          <li><a href="http://tcl.apache.org/">TCL</a></li>
          <li><a href="http://xml.apache.org/">XML</a></li>
          <li><a href="/foundation/conferences.html">Conferences</a></li>
          <li><a href="/foundation/">Foundation</a></li>
        </menu>
...... and so on


[8] The operating-system prompt is likely to be “>” (Win95) or “%” (Unix). When we say, for instance, “Type % ping,” we mean, “When you see '%', type ‘ping’.”

Get Apache: The Definitive Guide, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.