You are previewing RESTful Web Services.

RESTful Web Services

Cover of RESTful Web Services by Leonard Richardson... Published by O'Reilly Media, Inc.
  1. RESTful Web Services
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. A Note Regarding Supplemental Files
    3. Foreword
    4. Preface
      1. The Web Is Simple
      2. Big Web Services Are Not Simple
      3. The Story of the REST
      4. Reuniting the Webs
      5. What’s in This Book?
      6. Administrative Notes
      7. Conventions Used in This Book
      8. Using Code Examples
      9. Safari® Enabled
      10. How to Contact Us
      11. Acknowledgments
    5. 1. The Programmable Web and Its Inhabitants
      1. Kinds of Things on the Programmable Web
      2. HTTP: Documents in Envelopes
      3. Method Information
      4. Scoping Information
      5. The Competing Architectures
      6. Technologies on the Programmable Web
      7. Leftover Terminology
    6. 2. Writing Web Service Clients
      1. Web Services Are Web Sites
      2. The Sample Application
      3. Making the Request: HTTP Libraries
      4. Processing the Response: XML Parsers
      5. JSON Parsers: Handling Serialized Data
      6. Clients Made Easy with WADL
    7. 3. What Makes RESTful Services Different?
      1. Introducing the Simple Storage Service
      2. Object-Oriented Design of S3
      3. Resources
      4. HTTP Response Codes
      5. An S3 Client
      6. Request Signing and Access Control
      7. Using the S3 Client Library
      8. Clients Made Transparent with ActiveResource
      9. Parting Words
    8. 4. The Resource-Oriented Architecture
      1. Resource-Oriented What Now?
      2. What’s a Resource?
      3. URIs
      4. Addressability
      5. Statelessness
      6. Representations
      7. Links and Connectedness
      8. The Uniform Interface
      9. That’s It!
    9. 5. Designing Read-Only Resource-Oriented Services
      1. Resource Design
      2. Turning Requirements Into Read-Only Resources
      3. Figure Out the Data Set
      4. Split the Data Set into Resources
      5. Name the Resources
      6. Design Your Representations
      7. Link the Resources to Each Other
      8. The HTTP Response
      9. Conclusion
    10. 6. Designing Read/Write Resource-Oriented Services
      1. User Accounts as Resources
      2. Custom Places
      3. A Look Back at the Map Service
    11. 7. A Service Implementation
      1. A Social Bookmarking Web Service
      2. Figuring Out the Data Set
      3. Resource Design
      4. Design the Representation(s) Accepted from the Client
      5. Design the Representation(s) Served to the Client
      6. Connect Resources to Each Other
      7. What’s Supposed to Happen?
      8. What Might Go Wrong?
      9. Controller Code
      10. Model Code
      11. What Does the Client Need to Know?
    12. 8. REST and ROA Best Practices
      1. Resource-Oriented Basics
      2. The Generic ROA Procedure
      3. Addressability
      4. State and Statelessness
      5. Connectedness
      6. The Uniform Interface
      7. This Stuff Matters
      8. Resource Design
      9. URI Design
      10. Outgoing Representations
      11. Incoming Representations
      12. Service Versioning
      13. Permanent URIs Versus Readable URIs
      14. Standard Features of HTTP
      15. Faking PUT and DELETE
      16. The Trouble with Cookies
      17. Why Should a User Trust the HTTP Client?
    13. 9. The Building Blocks of Services
      1. Representation Formats
      2. Prepackaged Control Flows
      3. Hypermedia Technologies
    14. 10. The Resource-Oriented Architecture Versus Big Web Services
      1. What Problems Are Big Web Services Trying to Solve?
      2. SOAP
      3. WSDL
      4. UDDI
      5. Security
      6. Reliable Messaging
      7. Transactions
      8. BPEL, ESB, and SOA
      9. Conclusion
    15. 11. Ajax Applications as REST Clients
      1. From AJAX to Ajax
      2. The Ajax Architecture
      3. A Example
      4. The Advantages of Ajax
      5. The Disadvantages of Ajax
      6. REST Goes Better
      7. Making the Request
      8. Handling the Response
      9. JSON
      10. Don’t Bogart the Benefits of REST
      11. Cross-Browser Issues and Ajax Libraries
      12. Subverting the Browser Security Model
    16. 12. Frameworks for RESTful Services
      1. Ruby on Rails
      2. Restlet
      3. Django
    17. A. Some Resources for REST and Some RESTful Resources
      1. Standards and Guides
      2. Services You Can Use
    18. B. The HTTP Response Code Top 42
      1. Three to Seven Status Codes: The Bare Minimum
      2. 1xx: Meta
      3. 2xx: Success
      4. 3xx: Redirection
      5. 4xx: Client-Side Error
      6. 5xx: Server-Side Error
    19. C. The HTTP Header Top Infinity
      1. Standard Headers
      2. Nonstandard Headers
    20. Index
    21. About the Authors
    22. Colophon
    23. SPECIAL OFFER: Upgrade this ebook with O’Reilly
O'Reilly logo

Processing the Response: XML Parsers

The entity-body is usually the most important part of an HTTP response. Where web services are concerned, the entity-body is usually an XML document, and the client gets most of the information it needs by running this document through an XML parser.

Now, there are many HTTP client libraries, but they all have exactly the same task. Given a URI, a set of headers, and a body document, the client’s job is to construct an HTTP request and send it to a certain server. Some libraries have more features than others: cookies, authentication, caching, and the other ones I mentioned. But all these extra features are implemented within the HTTP request, usually as extra headers. A library might offer an object-oriented interface (like Net::HTTP) or a file-like interface (like open-uri), but both interfaces do the same thing. There’s only one kind of HTTP client library.

But there are three kinds of XML parsers. It’s not just that some XML parsers have features that others lack, or that one interface is more natural than another. There are two basic XML parsing strategies: the document-based strategy of DOM and other tree-style parsers, and the event-based strategy of SAX and “pull” parsers. You can get a tree-style or a SAX parser for any programming language, and a pull parser for almost any language.

The document-based, tree-style strategy is the simplest of the three models. A tree-style parser models an XML document as a nested data structure. Once you’ve got this data structure, you can search and process it with XPath queries, CSS selectors, or custom navigation functions: whatever your parser supports. A DOM parser is a tree-style parser that implements a specific interface defined by the W3C.

The tree-style strategy is easy to use, and it’s the one I use the most. With a tree-style parser, the document is just an object like the other objects in your program. The big shortcoming is that you have to deal with the document as a whole. You can’t start working on the document until you’ve processed the whole thing into a tree, and you can’t avoid loading the whole document into memory. For documents that are simple but very large, this is inefficient. It would be a lot better to handle tags as they’re parsed.

Instead of a data structure, a SAX-style or pull parser turns a document into a stream of events. Starting and closing tags, XML comments, and entity declarations are all events.

A pull parser is useful when you need to handle almost every event. A pull parser lets you handle one event at a time, “pulling” the next one from the stream as needed. You can take action in response to individual events as they come in, or build up a data structure for later use—presumably a smaller data structure than the one a tree-style parser would build. You can stop parsing the document at any time and come back to it later by pulling the next event from the stream.

A SAX parser is more complex, but useful when you only care about a few of the many events that will be streaming in. You drive a SAX parser by registering callback methods with it. Once you’re done defining callbacks, you set the parser loose on a document. The parser turns the document into a series of events, and processes every event in the document without stopping. When an event comes along that matches one of your callbacks, the parser triggers that callback, and your custom code runs. Once the callback completes, the SAX parser goes back to processing events without stopping.

The advantage of the document-based approach is that it gives you random access to the document’s contents. With event-based parsers, once the events have fired, they’re gone. If you want to trigger them again you need to re-parse the document. What’s more, an event-based parser won’t notice that a malformed XML document is malformed until it tries to parse the bad spot, and crashes. Before passing a document into an event-based parser, you’ll need to make sure the document is well formed, or else accept that your callback methods can be triggered for a document that turns out not to be good.

Some programming languages come with a standard set of XML parsers. Others have a canonical third-party parser library. For the sake of performance, some languages also have bindings to fast parsers written in C. I’d like to go through the list of languages again now, and make recommendations for document- and event-based XML parsers. I’ll rate commonly available parsers on speed, the quality of their interface, how well they support XPath (for tree-style parsers), how strict they are, and whether or not they support schema-based validation. Depending on the application, a strict parser may be a good thing (because an XML document will be parsed the correct way or not at all) or a bad thing (because you want to use a service that generates bad XML).

In the sample clients given above, I showed not only how to use my favorite HTTP client library for a language, but how to use my favorite tree-style parser for that language. To show you how event-based parsers work, I’ll give two more examples of clients using Ruby’s built-in SAX and pull parsers.

Ruby: REXML, I Guess

Ruby comes with a standard XML parser library, REXML, that supports both DOM and SAX interfaces, and has good XPath support. Unfortunately, REXML’s internals put it in a strange middle ground: it’s too strict to be used to parse bad XML, but not strict enough to reject all bad XML.

I use REXML throughout this book because it’s the default choice, and because I only deal with well-formed XML. If you want to guarantee that you only deal with well-formed XML, you’ll need to install the Ruby bindings to the GNOME project’s libxml2 library (described in Other Languages” later in this chapter).

If you want to be able to handle bad markup, the best choice is hpricot, available as the hpricot gem. It’s fast (it uses a C extension), and it has an intuitive interface including support for common XPath expressions.

Example 2-9 is an implementation of the client using REXML’s SAX interface.

Example 2-9. A Ruby client using a SAX parser

#!/usr/bin/ruby -w
# delicious-sax.rb
require 'open-uri'
require 'rexml/parsers/sax2parser'

def print_my_recent_bookmarks(username, password)
  # Make an HTTPS request and read the entity-body as an XML document.
  xml = open('',
             :http_basic_authentication => [username, password])

  # Create a SAX parser whose destiny is to parse the XML entity-body.
  parser =

  # When the SAX parser encounters a 'post' tag...
  parser.listen(:start_element, ["post"]) do |uri, tag, fqtag, attributes|
    # should print out information about the tag.
    puts "#{attributes['description']}: #{attributes['href']}"

  # Make the parser fulfil its destiny to parse the XML entity-body.

# Main program.
username, password = ARGV
unless username and password
  puts "Usage: #{$0} [USERNAME] [PASSWORD]"  
print_my_recent_bookmarks(username, password)

In this program, the data isn’t parsed (or even read from the HTTP connection) until the call to SAXParser#parse. Up to that point I’m free to call listen and set up pieces of code to run in response to parser events. In this case, the only event I’m interested in is the start of a post tag. My code block gets called every time the parser finds a post tag. This is the same as parsing the XML document with a tree-style parser, and running the XPath expression “//post” against the object tree. What does my code block do? The same thing my other example programs do when they find a post tag: print out the values of the description and href attributes.

This implementation is faster and much more memory-efficient than the equivalent tree-style implementation. However, complex SAX-based programs are much more difficult to write than equivalent tree-style programs. Pull parsers are a good compromise. Example 2-10 shows a client implementation that uses REXML’s pull parser interface.

Example 2-10. A client using REXML’s pull parser

#!/usr/bin/ruby -w
# delicious-pull.rb
require 'open-uri'
require 'rexml/parsers/pullparser'

def print_my_recent_bookmarks(username, password)
  # Make an HTTPS request and read the entity-body as an XML document.
  xml = open('',
             :http_basic_authentication => [username, password])

  # Feed the XML entity-body into a pull parser
  parser =

  # Until there are no more events to pull...
  while parser.has_next?
    # ...pull the next event.
    tag = parser.pull    
    # If it's a 'post' tag...
    if tag.start_element?
      if tag[0] == 'post'       
        # Print information about the bookmark.
        attrs = tag[1]
        puts "#{attrs['description']}: #{attrs['href']}"

# Main program.
username, password = ARGV
unless username and password
  puts "Usage: #{$0} [USERNAME] [PASSWORD]"  
print_my_recent_bookmarks(username, password)

Python: ElementTree

The world is full of XML parsers for Python. There are seven different XML interfaces in the Python 2.5 standard library alone. For full details, see the Python library reference.

For tree-style parsing, the best library is ElementTree. It’s fast, it has a sensible interface, and as of Python 2.5 you don’t have to install anything because it’s in the standard library. On the downside, its support for XPath is limited to simple expressions—of course, nothing else in the standard library supports XPath at all. If you need full XPath support, try 4Suite.

Beautiful Soup is a slower tree-style parser that is very forgiving of invalid XML, and offers a programmatic interface to a document. It also handles most character set conversions automatically, letting you work with Unicode data.

For SAX-style parsing, the best choice is the xml.sax module in the standard library. The PyXML suite includes a pull parser.

Java: javax.xml, Xerces, or XMLPull

Java 1.5 includes the XML parser written by the Apache Xerces project. The core classes are found in the packages javax.xml.*, (for instance, javax.xml.xpath). The DOM interface lives in org.w3c.dom.*, and the SAX interface lives in org.xml.sax.*. If you’re using a previous version of Java, you can install Xerces yourself and take advantage of the same interface found in Java 1.5 (

There are a variety of pull parsers for Java. Sun’s Web Services Developer Pack includes a pull parser in the package.

For parsing bad XML, you might try TagSoup.

C#: System.Xml.XmlReader

The.NET Common Language Runtime comes with a pull parser interface, in contrast to the more typical (and more complex) SAX-style interface. You can also create a full W3C DOM tree using XmlDocument. The XPathDocument class lets you iterate over nodes in the tree that match an XPath expression.

If you need to handle broken XML documents, check out Chris Lovett’s SgmlReader at


You can create a SAX-style parser with the function xml_parser_create, and a pull parser with the XMLReader extension. The DOM PHP extension (included in PHP 5) provides a tree-style interface to the GNOME project’s libxml2 C library. You might have an easier time using SimpleXML, a tree-style parser that’s not an official DOM implementation. That’s what I used in Example 2-8.

There’s also a pure PHP DOM parser called DOMIT!.

JavaScript: responseXML

If you’re using XMLHttpRequest to write an Ajax client, you don’t have to worry about the XML parser at all. If you make a request and the response entity-body is in XML format, the web browser parses it with its own tree-style parser, and makes it available through the responseXML property of the XMLHttpRequest object. You manipulate this document with JavaScript DOM methods: the same ones you use to manipulate HTML documents displayed in the browser. Chapter 11 has more information on how to use responseXML—and how to handle non-XML documents with the responseData member.

There’s a third-party XML parser, XML for <SCRIPT>, which works independently of the parser built into the client’s web browser. “XML for <SCRIPT>” offers DOM and SAX interfaces, and supports XPath queries.

Other Languages


When you load a URI with XML.load, it’s automatically parsed into an XML object, which exposes a tree-style interface.


Expat is the most popular SAX-style parser. The GNOME project’s libxml2 contains DOM, pull, and SAX parsers.


You can use either of the C parsers, or the object-oriented Xerces-C++ parser. Like the Java version of Xerces, Xerces-C++ exposes both DOM and SAX interfaces.

Common Lisp

Use SXML. It exposes a SAX-like interface, and can also turn an XML document into tree-like S-expressions or Lisp data structures.


As with Python, there are a variety of XML parsers for Perl. They’re all available on CPAN. XML::XPath has XPath support, and XML::Simple turns an XML document into standard Perl data structures. For SAX-style parsing, use XML::SAX::PurePerl. For pull parsing, use XML::LibXML::Reader. The Perl XML FAQ has an overview of the most popular Perl XML libraries.

The best content for your career. Discover unlimited learning on demand for around $1/day.