If I was classifying marine animals I’d start by talking about the things they have in common: DNA, cellular structure, the laws of embryonic development. Then I’d show how animals distinguish themselves from each other by specializing away from the common ground. To classify the programmable web, I’d like to start off with an overview of HTTP, the protocol that all web services have in common.
HTTP is a document-based protocol, in which the client puts a
document in an envelope and sends it to the server. The server returns
the favor by putting a response document in an envelope and sending it
to the client. HTTP has strict standards for what the envelopes should
look like, but it doesn’t much care what goes inside. Example 1-5 shows a sample envelope: the HTTP request my web
browser sends when I visit the homepage of
oreilly.com. I’ve truncated two lines to make
the text fit on the printed page.
Example 1-5. An HTTP GET request for http://www.oreilly.com/index.html
GET /index.html HTTP/1.1 Host: www.oreilly.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12)... Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,... Accept-Language: us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-15,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive
In case you’re not familiar with HTTP, now is a good time to point out the major parts of the HTTP request. I use these terms throughout the book.
The name of the HTTP method is like a method name in a
programming language: it indicates how the client expects the
server to process this envelope. In this case, the client (my
web browser) is trying to GET some information from the server
This is the portion of the URI to the right of the hostname: here, http://www.oreilly.com/index.html becomes “/index.html.” In terms of the envelope metaphor, the path is the address on the envelope. In this book I sometimes refer to the “URI” as shorthand for just the path.
These are bits of metadata: key-value pairs that act like
informational stickers slapped onto the envelope. This request
has eight headers:
Accept, and so on. There’s a standard
list of HTTP headers (see Appendix C), and
applications can define their own.
This is the document that inside the envelope. This particular request has no entity-body, which means the envelope is empty! This is typical for a GET request, where all the information needed to complete the request is in the path and the headers.
The HTTP response is also a document in a envelope. It’s almost
identical in form to the HTTP request. Example 1-6 shows a trimmed version of what the
oreilly.com sends my web
browser when I make the request in Example 1-5.
Example 1-6. The response to an HTTP GET request for http://www.oreilly.com/index.html
HTTP/1.1 200 OK Date: Fri, 17 Nov 2006 15:36:32 GMT Server: Apache Last-Modified: Fri, 17 Nov 2006 09:05:32 GMT Etag: "7359b7-a7fa-455d8264 Accept-Ranges: bytes Content-Length: 43302 Content-Type: text/html X-Cache: MISS from www.oreilly.com Keep-Alive: timeout=15, max=1000 Connection: Keep-Alive <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> ... <title>oreilly.com -- Welcome to O'Reilly Media, Inc.</title> ...
The response can be divided into three parts:
This numeric code tells the client whether its request went well or poorly, and how the client should regard this envelope and its contents. In this case the GET operation must have succeeded, since the response code is 200 (“OK”). I describe the HTTP response codes in Appendix B.
Just as with the request headers, these are informational
stickers slapped onto the envelope. This response has 10
Server, and so on.
Again, this is the document inside the envelope, and this time there actually is one! The entity-body is the fulfillment of my GET request. The rest of the response is just an envelope with stickers on it, telling the web browser how to deal with the document.
The most important of these stickers is worth mentioning
separately. The response header
Content-Type gives the
media type of the entity-body. In this case, the media type is
text/html. This lets my web
browser know it can render the entity-body as an HTML document:
a web page.
There’s a standard list of media types (http://www.iana.org/assignments/media-types/).
The most common media types designate textual documents
text/html), structured data
and images (
other discussions of REST or HTTP, you may see the media type
called the “MIME type,” “content type,” or “data type.”