O'Reilly logo

RESTful Web Services by Sam Ruby, Leonard Richardson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

An S3 Client

The Amazon sample libraries, and the third-party contributions like AWS::S3, eliminate much of the need for custom S3 client libraries. But I’m not telling you about S3 just so you’ll know about a useful web service. I want to use it to illustrate the theory behind REST. So I’m going to write a Ruby S3 client of my own, and dissect it for you as I go along.

Just to show it can be done, my library will implement an object-oriented interface, like the one from Example 3-1, on top of the S3 service. The result will look like ActiveRecord or some other object-relational mapper. Instead of making SQL calls under the covers to store data in a database, though, it’ll make HTTP requests under the covers to store data on the S3 service. Rather than give my methods resource-specific names like getBuckets and getObjects, I’ll try to use names that reflect the underlying RESTful interface: get, put, and so on.

The first thing I need is an interface to Amazon’s rather unusual web service authorization mechanism. But that’s not as interesting as seeing the web service in action, so I’m going to skip it for now. I’m going to create a very small Ruby module called S3::Authorized, just so my other S3 classes can include it. I’ll come back to it at the end, and fill in the details.

Example 3-3 shows a bit of throat-clearing code.

Example 3-3. S3 Ruby client: Initial code

#!/usr/bin/ruby -w
# S3lib.rb

# Libraries necessary for making HTTP requests and parsing responses.
require 'rubygems'
require 'rest-open-uri'
require 'rexml/document'

# Libraries necessary for request signing
require 'openssl'
require 'digest/sha1'
require 'base64'
require 'uri'

module S3 # This is the beginning of a big, all-encompassing module.

module Authorized
  # Enter your public key (Amazon calls it an "Access Key ID") and
  # your private key (Amazon calls it a "Secret Access Key"). This is
  # so you can sign your S3 requests and Amazon will know who to
  # charge.
  @@public_key = ''
  @@private_key = ''

  if @@public_key.empty? or @@private_key.empty?	
    raise "You need to set your S3 keys."

  # You shouldn't need to change this unless you're using an S3 clone like
  # Park Place.
  HOST = 'https://s3.amazonaws.com/'

The only interesting aspect of this bare-bones S3::Authorized is that it’s where you should plug in the two cryptographic keys associated with your Amazon Web Services account. Every S3 request you make includes your public key (Amazon calls it an “Access Key ID”) so that Amazon can identify you. Every request you make must be cryptographically signed with your private key (Amazon calls it a “Secret Access Key”) so that Amazon knows it’s really you. I’m using the standard cryptographic terms, even though your “private key” is not totally private—Amazon knows it too. It is private in the sense that you should never reveal it to anyone else. If you do, the person you reveal it to will be able to make S3 requests and have Amazon charge you for it.

The Bucket List

Example 3-4 shows an object-oriented class for my first resource, the list of buckets. I’ll call the class for this resource S3::BucketList.

Example 3-4. S3 Ruby client: the S3::BucketList class

# The bucket list.
class BucketList
  include Authorized

  # Fetch all the buckets this user has defined.
  def get
    buckets = []

    # GET the bucket list URI and read an XML document from it.
    doc = REXML::Document.new(open(HOST).read)

    # For every bucket...
    REXML::XPath.each(doc, "//Bucket/Name") do |e|
      # ...create a new Bucket object and add it to the list.
      buckets << Bucket.new(e.text) if e.text
    return buckets

Now my file is a real web service client. If I call S3::BucketList#get I make a secure HTTP GET request to https://s3.amazonaws.com/, which happens to be the URI of the resource “a list of your buckets.” The S3 service sends back an XML document that looks something like Example 3-5. This is a representation (as I’ll start calling it in the next chapter) of the resource “a list of your buckets.” It’s just some information about the current state of that list. The Owner tag makes it clear whose bucket list it is (my AWS account name is evidently “leonardr28”), and the Buckets tag contains a number of Bucket tags describing my buckets (in this case, there’s one Bucket tag and one bucket).

Example 3-5. A sample “list of your buckets”

<?xml version='1.0' encoding='UTF-8'?>
<ListAllMyBucketsResult xmlns='http://s3.amazonaws.com/doc/2006-03-01/'>

For purposes of this small client application, the Name is the only aspect of a bucket I’m interested in. The XPath expression //Bucket/Name gives me the name of every bucket, which is all I need to create Bucket objects.

As we’ll see, one thing that’s missing from this XML document is links. The document gives the name of every bucket, but says nothing about where the buckets can be found on the Web. In terms of the REST design criteria, this is the major shortcoming of Amazon S3. Fortunately, it’s not too difficult to program a client to calculate a URI from the bucket name. I just follow the rule I gave earlier: https://s3.amazonaws.com/{name-of-bucket}.

The Bucket

Now, let’s write the S3::Bucket class, so that S3::BucketList.get will have something to instantiate (Example 3-6).

Example 3-6. S3 Ruby client: the S3::Bucket class

# A bucket that you've stored (or will store) on the S3 application.
class Bucket
  include Authorized
  attr_accessor :name

  def initialize(name)
    @name = name

  # The URI to a bucket is the service root plus the bucket name.
  def uri
    HOST + URI.escape(name)

  # Stores this bucket on S3. Analagous to ActiveRecord::Base#save,
  # which stores an object in the database. See below in the
  # book text for a discussion of acl_policy.
  def put(acl_policy=nil)
    # Set the HTTP method as an argument to open(). Also set the S3
    # access policy for this bucket, if one was provided.
    args = {:method => :put}
    args["x-amz-acl"] = acl_policy if acl_policy

    # Send a PUT request to this bucket's URI.
    open(uri, args)
    return self

  # Deletes this bucket. This will fail with HTTP status code 409
  # ("Conflict") unless the bucket is empty.
  def delete
    # Send a DELETE request to this bucket's URI.
    open(uri, :method => :delete)

Here are two more web service methods: S3::Bucket#put and S3::Bucket#delete. Since the URI to a bucket uniquely identifies the bucket, deletion is simple: you send a DELETE request to the bucket URI, and it’s gone. Since a bucket’s name goes into its URI, and a bucket has no other settable properties, it’s also easy to create a bucket: just send a PUT request to its URI. As I’ll show when I write S3::Object, a PUT request is more complicated when not all the data can be stored in the URI.

Earlier I compared my S3:: classes to ActiveRecord classes, but S3::Bucket#put works a little differently from an ActiveRecord implementation of save. A row in an ActiveRecord-controlled database table has a numeric unique ID. If you take an ActiveRecord object with ID 23 and change its name, your change is reflected as a change to the database record with ID 23:

SET name="newname" WHERE id=23

The permanent ID of an S3 bucket is its URI, and the URI includes the name. If you change the name of a bucket and call put, the client doesn’t rename the old bucket on S3: it creates a new, empty bucket at a new URI with the new name. This is a result of design decisions made by the S3 programmers. It doesn’t have to be this way. The Ruby on Rails framework has a different design: when it exposes database rows through a RESTful web service, the URI to a row incorporates its numeric database IDs. If S3 was a Rails service you’d see buckets at URIs like /buckets/23. Renaming the bucket wouldn’t change the URI.

Now comes the last method of S3::Bucket, which I’ve called get. Like S3::BucketList.get, this method makes a GET request to the URI of a resource (in this case, a “bucket” resource), fetches an XML document, and parses it into new instances of a Ruby class (see Example 3-7). This method supports a variety of ways to filter the contents of S3 buckets. For instance, you can use :Prefix to retrieve only objects whose keys start with a certain string. I won’t cover these filtering options in detail. If you’re interested in them, see the S3 technical documentation on “Listing Keys.”

Example 3-7. S3 Ruby client: the S3::Bucket class (concluded)

  # Get the objects in this bucket: all of them, or some subset.
  # If S3 decides not to return the whole bucket/subset, the second
  # return value will be set to true. To get the rest of the objects,
  # you'll need to manipulate the subset options (not covered in the
  # book text).
  # The subset options are :Prefix, :Marker, :Delimiter, :MaxKeys.
  # For details, see the S3 docs on "Listing Keys".
  def get(options={})
    # Get the base URI to this bucket, and append any subset options
    # onto the query string.
    uri = uri()
    suffix = '?'

    # For every option the user provided...
    options.each do |param, value|      
      # ...if it's one of the S3 subset options...
      if [:Prefix, :Marker, :Delimiter, :MaxKeys].member? :param
        # ...add it to the URI.
        uri << suffix << param.to_s << '=' << URI.escape(value)
        suffix = '&'

    # Now we've built up our URI. Make a GET request to that URI and
    # read an XML document that lists objects in the bucket.
    doc = REXML::Document.new(open(uri).read)
    there_are_more = REXML::XPath.first(doc, "//IsTruncated").text == "true"

    # Build a list of S3::Object objects.
    objects = []
    # For every object in the bucket...
    REXML::XPath.each(doc, "//Contents/Key") do |e|
      # ...build an S3::Object object and append it to the list.
      objects << Object.new(self, e.text) if e.text
    return objects, there_are_more

Make a GET request of the application’s root URI, and you get a representation of the resource “a list of your buckets.” Make a GET request to the URI of a “bucket” resource, and you get a representation of the bucket: an XML document like the one in Example 3-8, containing a Contents tag for every element of the bucket.

Example 3-8. A sample bucket representation

<?xml version='1.0' encoding='UTF-8'?>  
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

In this case, the portion of the document I find interesting is the list of a bucket’s objects. An object is identified by its key, and I use the XPath expression “//Contents/Key” to fetch that information. I’m also interested in a certain Boolean variable (“//IsTruncated”): whether this document contains keys for every object in the bucket, or whether S3 decided there were too many to send in one document and truncated the list.

Again, the main thing missing from this representation is links. The document lists lots of information about the objects, but not their URIs. The client is expected to know how to turn an object name into that object’s URI. Fortunately, it’s not too hard to build an object’s URI, using the rule I already gave: https://s3.amazonaws.com/{name-of-bucket}/{name-of-object}.

The S3 Object

Now we’re ready to implement an interface to the core of the S3 service: the object. Remember that an S3 object is just a data string that’s been given a name (a key) and a set of metadata key-value pairs (such as Content-Type="text/html"). When you send a GET request to the bucket list, or to a bucket, S3 serves an XML document that you have to parse. When you send a GET request to an object, S3 serves whatever data string you PUT there earlier—byte for byte.

Example 3-9 shows the beginning of S3::Object, which should be nothing new by now.

Example 3-9. S3 Ruby client: the S3::Object class

# An S3 object, associated with a bucket, containing a value and metadata.
class Object
  include Authorized

  # The client can see which Bucket this Object is in.
  attr_reader :bucket
  # The client can read and write the name of this Object.
  attr_accessor :name

  # The client can write this Object's metadata and value.
  # I'll define the corresponding "read" methods later.
  attr_writer :metadata, :value

  def initialize(bucket, name, value=nil, metadata=nil)
    @bucket, @name, @value, @metadata = bucket, name, value, metadata

  # The URI to an Object is the URI to its Bucket, and then its name.
  def uri
    @bucket.uri + '/' + URI.escape(name)

What comes next is my first implementation of an HTTP HEAD request. I use it to fetch an object’s metadata key-value pairs and populate the metadata hash with it (the actual implementation of store_metadata comes at the end of this class). Since I’m using rest-open-uri, the code to make the HEAD request looks the same as the code to make any other HTTP request (see Example 3-10).

Example 3-10. S3 Ruby client: the S3::Object#metadata method

  # Retrieves the metadata hash for this Object, possibly fetching
  # it from S3.
  def metadata
    # If there's no metadata yet...
    unless @metadata
      # Make a HEAD request to this Object's URI, and read the metadata
      # from the HTTP headers in the response.
        store_metadata(open(uri, :method => :head).meta) 
      rescue OpenURI::HTTPError => e
        if e.io.status == ["404", "Not Found"]
          # If the Object doesn't exist, there's no metadata and this is not
          # an error. 
          @metadata = {}
          # Otherwise, this is an error.
          raise e

    return @metadata

The goal here is to fetch an object’s metadata without fetching the object itself. This is the difference between downloading a movie review and downloading the movie, and when you’re paying for the bandwidth it’s a big difference. This distinction between metadata and representation is not unique to S3, and the solution is general to all resource-oriented web services. The HEAD method gives any client a way of fetching the metadata for any resource, without also fetching its (possibly enormous) representation.

Of course, sometimes you do want to download the movie, and for that you need a GET request. I’ve put the GET request in the accessor method S3::Object#value, in Example 3-11. Its structure mirrors that of S3::Object#metadata.

Example 3-11. S3 Ruby client: the S3::Object#value method

  # Retrieves the value of this Object, possibly fetching it
  # (along with the metadata) from S3.
  def value
    # If there's no value yet...
    unless @value
      # Make a GET request to this Object's URI.
      response = open(uri)
      # Read the metadata from the HTTP headers in the response.
      store_metadata(response.meta) unless @metadata
      # Read the value from the entity-body
      @value = response.read
    return @value

The client stores objects on the S3 service the same way it stores buckets: by sending a PUT request to a certain URI. The bucket PUT is trivial because a bucket has no distinguishing features other than its name, which goes into the URI of the PUT request. An object PUT is more complex. This is where the HTTP client specifies an object’s metadata (such as Content-Type) and value. This information will be made available on future HEAD and GET requests.

Fortunately, setting up the PUT request is not terribly complicated, because an object’s value is whatever the client says it is. I don’t have to wrap the object’s value in an XML document or anything. I just send the data as is, and set HTTP headers that correspond to the items of metadata in my metadata hash (see Example 3-12).

Example 3-12. S3 Ruby client: the S3::Object#put method

  # Store this Object on S3.
  def put(acl_policy=nil)

    # Start from a copy of the original metadata, or an empty hash if
    # there is no metadata yet.
    args = @metadata ? @metadata.clone : {}

    # Set the HTTP method, the entity-body, and some additional HTTP
    # headers.
    args[:method] = :put
    args["x-amz-acl"] = acl_policy if acl_policy
    if @value
      args["Content-Length"] = @value.size.to_s
      args[:body] = @value

    # Make a PUT request to this Object's URI.
    open(uri, args)
    return self

The S3::Object#delete implementation (see Example 3-13) is identical to S3::Bucket#delete.

Example 3-13. S3 Ruby client: the S3::Object#delete method

  # Deletes this Object.
  def delete
    # Make a DELETE request to this Object's URI.
    open(uri, :method => :delete)

And Example 3-14 shows the method for turning HTTP response headers into S3 object metadata. Except for Content-Type, you should prefix all the metadata headers you set with the string “x-amz-meta-”. Otherwise they won’t make the round trip to the S3 server and back to a web service client. S3 will think they’re quirks of your client software and discard them.

Example 3-14. S3 Ruby client: the S3::Object#store_metadata method


  # Given a hash of headers from a HTTP response, picks out the
  # headers that are relevant to an S3 Object, and stores them in the
  # instance variable @metadata.
  def store_metadata(new_metadata)    
    @metadata = {}
    new_metadata.each do |h,v| 
      if RELEVANT_HEADERS.member?(h) || h.index('x-amz-meta') == 0
        @metadata[h] = v  
  RELEVANT_HEADERS = ['content-type', 'content-disposition', 'content-range',

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required