Now that we've sketched what HTTP's messages look like, let's talk for a moment about how messages move from place to place, across Transmission Control Protocol (TCP) connections.
HTTP is an application layer protocol. HTTP doesn't worry about the nitty-gritty details of network communication; instead, it leaves the details of networking to TCP/IP, the popular reliable Internet transport protocol.
Error-free data transportation
In-order delivery (data will always arrive in the order in which it was sent)
Unsegmented data stream (can dribble out data in any size at any time)
The Internet itself is based on TCP/IP, a popular layered set of packet-switched network protocols spoken by computers and network devices around the world. TCP/IP hides the peculiarities and foibles of individual networks and hardware, letting computers and networks of any type talk together reliably.
Once a TCP connection is established, messages exchanged between the client and server computers will never be lost, damaged, or received out of order.
In networking terms, the HTTP protocol is layered over TCP. HTTP uses TCP to transport its message data. Likewise, TCP is layered over IP (see Figure 1-9).
Setting up a TCP connection is sort of like calling someone at a corporate office. First, you dial the company's phone number. This gets you to the right organization. Then, you dial the specific extension of the person you're trying to reach.
In TCP, you need the IP address of the server computer and the TCP port number associated with the specific software program running on the server.
This is all well and good, but how do you get the IP address and port number of the HTTP server in the first place? Why, the URL, of course! We mentioned before that URLs are the addresses for resources, so naturally enough they can provide us with the IP address for the machine that has the resource. Let's take a look at a few URLs:
http://220.127.116.11:80/index.html http://www.netscape.com:80/index.html http://www.netscape.com/index.html
The first URL has the machine's IP address, "18.104.22.168", and port number, "80".
The second URL doesn't have a numeric IP address; it has a textual domain name, or hostname ("www.netscape.com"). The hostname is just a human-friendly alias for an IP address. Hostnames can easily be converted into IP addresses through a facility called the Domain Name Service (DNS), so we're all set here, too. We will talk much more about DNS and URLs in Chapter 2.
With the IP address and port number, a client can easily communicate via TCP/IP. Figure 1-10 shows how a browser uses HTTP to display a simple HTML resource that resides on a distant server.
Here are the steps:
The browser extracts the server's hostname from the URL.
The browser converts the server's hostname into the server's IP address.
The browser extracts the port number (if any) from the URL.
The browser establishes a TCP connection with the web server.
The browser sends an HTTP request message to the server.
The server sends an HTTP response back to the browser.
The connection is closed, and the browser displays the document.
The Telnet utility connects your keyboard to a destination TCP port and connects the TCP port output back to your display screen. Telnet is commonly used for remote terminal sessions, but it can generally connect to any TCP server, including HTTP servers.
You can use the Telnet utility to talk directly to web servers. Telnet lets you open a TCP connection to a port on a machine and type characters directly into the port. The web server treats you as a web client, and any data sent back on the TCP connection is displayed onscreen.
Let's use Telnet to interact with a real web server. We will use Telnet to fetch the document pointed to by the URL http://www.joes-hardware.com:80/tools.html (you can try this example yourself ).
Let's review what should happen:
First, we need to look up the IP address of www.joes-hardware.com and open a TCP connection to port 80 on that machine. Telnet does this legwork for us.
Once the TCP connection is open, we need to type in the HTTP request.
When the request is complete (indicated by a blank line), the server should send back the content in an HTTP response and close the connection.
Example 1-1. An HTTP transaction using telnet
% telnet www.joes-hardware.com 80 Trying 22.214.171.124... Connected to joes-hardware.com. Escape character is '^]'. GET /tools.html HTTP/1.1 Host: www.joes-hardware.com HTTP/1.1 200 OK Date: Sun, 01 Oct 2000 23:25:17 GMT Server: Apache/1.3.11 BSafe-SSL/1.38 (Unix) FrontPage/126.96.36.199 Last-Modified: Tue, 04 Jul 2000 09:46:21 GMT ETag: "373979-193-3961b26d" Accept-Ranges: bytes Content-Length: 403 Connection: close Content-Type: text/html <HTML> <HEAD><TITLE>Joe's Tools</TITLE></HEAD> <BODY> <H1>Tools Page</H1> <H2>Hammers</H2> <P>Joe's Hardware Online has the largest selection of hammers on the earth.</P> <H2><A NAME=drills></A>Drills</H2> <P>Joe's Hardware has a complete line of cordless and corded drills, as well as the latest in plutonium-powered atomic drills, for those big around the house jobs.</P> ... </BODY> </HTML> Connection closed by foreign host.
Telnet looks up the hostname and opens a connection to the www.joes-hardware.com web server, which is listening on port 80. The three lines after the command are output from Telnet, telling us it has established a connection.
We then type in our basic request command, "GET /tools.html HTTP/1.1", and send a Host header providing the original hostname, followed by a blank line, asking the server to GET us the resource "/tools.html" from the server www.joes-hardware.com. After that, the server responds with a response line, several response headers, a blank line, and finally the body of the HTML document.
Beware that Telnet mimics HTTP clients well but
doesn't work well as a server. And automated Telnet
scripting is no fun at all. For a more flexible tool, you might want
to check out
nc (netcat). The
nc tool lets you easily manipulate and script UDP-
and TCP-based traffic, including HTTP. See http://www.bgw.org/tutorials/utilities/nc.php