In this chapter, we start to put it all together by looking at the HyperText Transport Protocol (HTTP) and how the browser interacts with the server. This interaction between the browser and server across the Internet is called the request/response cycle because the cycle of the browser making a request to the server and getting a response from the server happens over and over as we browse the Internet. The request/response cycle is the middle portion of Figure 4-1.
When you type a Uniform Resource Locator (URL) into the address box of your web browser and press Enter, you are asking your browser to retrieve a particular document somewhere on the Web. The URL http://www.dr-chuck.com/page1.htm can be broken down into three parts (as shown in Figure 4-2):
The first part of the URL indicates which network protocol is to be used when your browser contacts the host and requests the document. Usually the protocol is either http:// or https://, indicating HTTP or secure HTTP, respectively. Sometimes you will see a URL that starts with ftp://, indicating that the File Transfer Protocol (FTP) must be used to retrieve the document.
The second part of the URL is a host that is connected to the Internet. In this example, the hostname is www.dr-chuck.com.
The third part of the URL is the document that we are to retrieve from that host. In this example, the document is /page1.htm.
Using the information in the URL, the browser can retrieve the requested document following the rules of the HTTP protocol. The browser first makes a connection on the default port 80 to the host www.dr-chuck.com. Once the connection is established, the browser requests the document by sending the following command:
GET http://www.dr-chuck.com/page1.htm HTTP/1.1
The server running at www.dr-chuck.com receives this request, finds the document page1.htm, and returns the following HTML as the HTTP response:
<h1>The First Page</h1> <p> If you like, you can switch to the <a href="http://www.dr-chuck.com/page2.htm"> Second Page</a>. </p>
And then it closes the connection. This completes our first HTTP request/response cycle.
The browser then reads and parses the returned HTML in the HTTP response and renders a page that looks like Figure 4-3.
If you were to click on the Second Page link, the browser would
href value of
http://www.dr-chuck.com/page2.htm and repeat the
steps of an HTTP
GET request by
making another connection to www.dr-chuck.com on
port 80 and sending the following command:
GET http://www.dr-chuck.com/page2.htm HTTP/1.1
And the server would respond with the following HTML document:
<h1>The Second Page</h1> <p> If you like, you can switch back to the <a href="page1.htm"> First Page</a>. </p>
Figure 4-4 shows this
interaction showing the
being sent from the browser to the web server and the web server
returning the HTML document as the response.
Note that in the second page, our hypertext reference
href) is simply
page1.htm. If the protocol and host are
omitted from the hypertext reference, it assumes the same protocol and
host as the current document was retrieved from. This lets pages be
moved more easily from host to host. If the
href includes the full protocol and hostname,
it is called an absolute reference, and if these are omitted, it is called a
relative reference because the
assumed to be relative to the current document.
We can happily go back and forth between the first and second
pages—with each click, the browser makes a connection to the host on
port 80, sends an HTTP
for the document, and then displays the HTML, which is returned in the