O'Reilly logo

RESTful Web Services by Sam Ruby, Leonard Richardson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1. The Programmable Web and Its Inhabitants

When you write a computer program, you’re not limited to the algorithms you can think up. Your language’s standard library gives you some algorithms. You can get more from books, or in third-party libraries you find online. Only if you’re on the very cutting edge should you have to come up with your own algorithms.

If you’re lucky, the same is true for data. Some applications are driven entirely by the data the users type in. Sometimes data just comes to you naturally: if you’re analyzing spam, you should have no problem getting all you need. You can download a few public data sets—word lists, geographical data, lists of prime numbers, public domain texts—as though they were third-party libraries. But if you need some other kind of data, it doesn’t look good. Where’s the data going to come from? More and more often, it’s coming from the programmable web.

When you—a human being—want to find a book on a certain topic, you probably point your web browser to the URI of an online library or bookstore: say, http://www.amazon.com/.


The common term for the address of something on the Web is “URL.” I say “URI” throughout this book because that’s what the HTTP standard says. Every HTTP URI on the Web is also a URL, so you can substitute “URL” wherever I say “URI” with no loss of meaning.

You’re served a web page, a document in HTML format that your browser renders graphically. You visually scan the page for a search form, type your topic (say, “web services”) into a text box, and submit the form. At this point your web browser makes a second HTTP request, to a URI that incorporates your topic. To continue the Amazon example, the second URI your browser requests would be something like http://amazon.com/s?url=search-alias%3Dstripbooks&field-keywords=web+services.

The web server at amazon.com responds by serving a second document in HTML format. This document contains a description of your search results, links to additional search options, and miscellaneous commercial enticements (see Example 1-1). Again, your browser renders the document in graphical form, and you look at it and decide what to do from there.

Example 1-1. Part of the HTML response from amazon.com

<a href="http://www.amazon.com/Restful-Web-Services-Leonard-Richardson/dp/...>
 <span class="srTitle">RESTful Web Services</span>
by Leonard Richardson and Sam Ruby
<span class="bindingBlock">
 (<span class="binding">Paperback</span> - May 1, 2007)

The Web you use is full of data: book information, opinions, prices, arrival times, messages, photographs, and miscellaneous junk. It’s full of services: search engines, online stores, weblogs, wikis, calculators, and games. Rather than installing all this data and all these programs on your own computer, you install one program—a web browser—and access the data and services through it.

The programmable web is just the same. The main difference is that instead of arranging its data in attractive HTML pages with banner ads and cute pastel logos, the programmable web usually serves stark, brutal XML documents. The programmable web is not necessarily for human consumption. Its data is intended as input to a software program that does something amazing.

Example 1-2 shows a Ruby script that uses the programmable web to do a traditional human web task: find the titles of books matching a keyword. It hides the web access under a programming language interface, using the Ruby/Amazon library.

Example 1-2. Searching for books with a Ruby script

#!/usr/bin/ruby -w
# amazon-book-search.rb
require 'amazon/search'

if ARGV.size != 2
  puts "Usage: #{$0} [Amazon Web Services AccessKey ID] [text to search for]"
access_key, search_request = ARGV
req = Amazon::Search::Request.new(access_key)
# For every book in the search results...
req.keyword_search(search_request, 'books', Amazon::Search::LIGHT) do |book|
  # Print the book's name and the list of authors.
  puts %{"#{book.product_name}" by #{book.authors.join(', ')}}

To run this program, you’ll need to sign up for an Amazon Web Services account and pass in the Access Key ID as a command-line argument. Here’s a sample run of the program:

$ ruby ruby amazon-book-search.rb C1D4NQS41IMK2 "restful web services"
"RESTful Web Services" by Leonard Richardson, Sam Ruby
"Hacking with Ruby: Ruby and Rails for the Real World" by Mark Watson

At its best, the programmable web works the same way as the human web. When amazon-book-search.rb calls the method Amazon::Search::Request#keyword_search, the Ruby program starts acting like a web browser. It makes an HTTP request to a URI: in this case, something like http://xml.amazon.com/onca/xml3?KeywordSearch=restful+web+services&mode=books&f=xml&type=lite&page=1. The web server at xml.amazon.com responds with an XML document. This document, shown in Example 1-3, describes the search results, just like the HTML document you see in your web browser, but in a more structured form.

Example 1-3. Part of the XML response from xml.amazon.com

<ProductName>RESTful Web Services</ProductName>
 <Author>Leonard Richardson</Author>
 <Author>Sam Ruby</Author>
<ReleaseDate>01 May, 2007</ReleaseDate>

Once a web browser has submitted its HTTP request, it has a fairly easy task. It needs to render the response in a way a human being can understand. It doesn’t need to figure out what the HTTP response means: that’s the human’s job. A web service client doesn’t have this luxury. It’s programmed in advance, so it has to be both the web browser that fetches the data, and the “human” who decides what the data means. Web service clients must automatically extract meaning from HTTP responses and make decisions based on that meaning.

In Example 1-2, the web service client parses the XML document, extracts some interesting information (book titles and authors), and prints that information to standard output. The program amazon-book-search.rb is effectively a small, special-purpose web browser, relaying data to a human reader. It could easily do something else with the Amazon book data, something that didn’t rely on human intervention at all: stick the book titles into a database, maybe, or use the author information to drive a recommendation engine.

And the data doesn’t have to always flow toward the client. Just as you can bend parts of the human web to your will (by posting on your weblog or buying a book), you can write clients that modify the programmable web. You can use it as a storage space or as another source of algorithms you don’t have to write yourself. It depends on what service you need, and whether you can find someone else to provide it.

Example 1-4 is an example of a web service client that modifies the programmable web: the s3sh command shell for Ruby. It’s one of many clients written against another of Amazon’s web services: S3, or the Simple Storage Service. In Chapter 3 I cover S3’s workings in detail, so if you’re interested in using s3sh for yourself, you can read up on S3 there.

To understand this s3sh transcript, all you need to know is that Amazon S3 lets its clients store labelled pieces of data (“objects”) in labelled containers (“buckets”). The s3sh program builds an interactive programming interface on top of S3. Other clients use S3 as a backup tool or a web host. It’s a very flexible service.

Example 1-4. Manipulating the programmable web with s3sh and S3

$ s3sh
>> Service.buckets.collect { |b| b.name }
=> ["example.com"]

>> my_bucket = Bucket.find("example.com")

>> contents = open("disk_file.txt").read
=> "This text is the contents of the file disk_file.txt"

>> S3Object.store("mydir/mydocument.txt", contents, my_bucket.name)

>> my_bucket['directory/document.txt'].value
=> "This text is the contents of the file disk_file.txt"

In this chapter I survey the current state of the programmable web. What technologies are being used, what architectures are they used to implement, and what design styles are the most popular? I show some real code and some real HTTP conversations, but my main goal in this chapter is to get you thinking about the World Wide Web as a way of connecting computer programs to each other, on the same terms as it connects human beings to each other.

Kinds of Things on the Programmable Web

The programmable web is based on HTTP and XML. Some parts of it serve HTML, JavaScript Object Notation (JSON), plain text, or binary documents, but most parts use XML. And it’s all based on HTTP: if you don’t use HTTP, you’re not on the web.[5]Beyond that small island of agreement there is little but controversy. The terminology isn’t set, and different people use common terms (like “REST,” the topic of this book) in ways that combine into a vague and confusing mess. What’s missing is a coherent way of classifying the programmable web. With that in place, the meanings of individual terms will become clear.

Imagine the programmable web as an ecosystem, like the ocean, containing many kinds of strange creatures. Ancient scientists and sailors classified sea creatures by their superficial appearance: whales were lumped in with the fish. Modern scientists classify animals according to their position in the evolutionary tree of all life: whales are now grouped with the other mammals. There are two analogous ways of classifying the services that inhabit the programmable web: by the technologies they use (URIs, SOAP, XML-RPC, and so on), or by the underlying architectures and design philosophies.

Usually the two systems for classifying sea creatures get along. You don’t need to do DNA tests to know that a tuna is more like a grouper than a sea anenome. But if you really want to understand why whales can’t breathe underwater, you need to stop classifying them as fish (by superficial appearance) and start classifying them as mammals (by underlying architecture).[6]

When it comes to classifying the programmable web, most of today’s terminology sorts services by their superficial appearances: the technologies they use. These classifications work in most cases, but they’re conceptually lacking and they lead to whale-fish mistakes. I’m going to present a taxonomy based on architecture, which shows how technology choices follow from underlying design principles. I’m exposing divisions I’ll come back to throughout the book, but my main purpose is to zoom in on the parts of the programmable web that can reasonably be associated with the term “REST.”

[5] Thanks to Big Web Services’ WS-Addressing standard, it’s now possible to create a web service that’s not on the Web: one that uses email or TCP as its transport protocol instead of HTTP. I don’t think absolutely everything has to be on the Web, but it does seem like you should have to call this bizarre spectacle something other than a web service. This point isn’t really important, since in practice nearly everyone uses HTTP. Thus the footnote. The only exceptions I know of are eBay’s web services, which can send you SOAP documents over email as well as HTTP.

[6] Melville, in Moby-Dick, spends much of Chapter 22 (“Cetology”) arguing that the whale is a fish. This sounds silly but he’s not denying that whales have lungs and give milk; he’s arguing for a definition of “fish” based on appearance, as opposed to Linnaeus’s definition “from the law of nature” (ex lege naturae).

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required