Retrieve a representation of a resource: HTTP GET
Create a new resource: HTTP PUT to a new URI, or HTTP POST to an existing URI (see the POST” section below)
Modify an existing resource: HTTP PUT to an existing URI
Delete an existing resource: HTTP DELETE
I’ll explain how these four are used to represent just about any operation you can think of. I’ll also cover two HTTP methods for two less common operations: HEAD and OPTIONS.
These three should be familiar to you from the S3 example in Chapter 3. To fetch or delete a resource, the client just sends a GET or DELETE request to its URI. In the case of a GET request, the server sends back a representation in the response entity-body. For a DELETE request, the response entity-body may contain a status message, or nothing at all.
To create or modify a resource, the client sends a PUT request that usually includes an entity-body. The entity-body contains the client’s proposed new representation of the resource. What data this is, and what format it’s in, depends on the service. Whatever it looks like, this is the point at which application state moves onto the server and becomes resource state.
Again, think of the S3 service, where there are two kinds of resources you can create: buckets and objects. To create an object, you send a PUT request to its URI and include the object’s content in the entity-body of your request. You do the same thing to modify an object: the new content overwrites any old content.
Creating a bucket is a little different because you don’t have to specify an entity-body in the PUT request. A bucket has no resource state except for its name, and the name is part of the URI. (This is not quite true. The objects in a bucket are also elements of that bucket’s resource state: after all, they’re listed when you GET a bucket’s representation. But every S3 object is a resource of its own, so there’s no need to manipulate an object through its bucket. Every object exposes the uniform interface and you can manipulate it separately.) Specify the bucket’s URI and you’ve specified its representation. PUT requests for most resources do include an entity-body containing a representation, but as you can see it’s not a requirement.
Retrieve a metadata-only representation: HTTP HEAD
Check which HTTP methods a particular resource supports: HTTP OPTIONS
You saw the HEAD method exposed by the S3 services’s resources in Chapter 3. An S3 client uses HEAD to fetch metadata about a resource without downloading the possibly enormous entity-body. That’s what HEAD is for. A client can use HEAD to check whether a resource exists, or find out other information about the resource, without fetching its entire representation. HEAD gives you exactly what a GET request would give you, but without the entity-body.
There are two standard HTTP methods I don’t cover in this book: TRACE and CONNECT. TRACE is used to debug proxies, and CONNECT is used to forward some other protocol through an HTTP proxy.
The OPTIONS method lets the client discover what it’s allowed to
do to a resource. The response to an OPTIONS request contains the HTTP
Allow header, which lays out the
subset of the uniform interface this resource supports. Here’s a
Allow: GET, HEAD
That particular header means the client can expect the server to act reasonably to a GET or HEAD request for this resource, but that the resource doesn’t support any other HTTP methods. Effectively, this resource is read-only.
The headers the client sends in the request may affect the
Allow header the server sends in
response. For instance, if you send a proper
Authorization header along with an OPTIONS
request, you may find that you’re allowed to make GET, HEAD, PUT, and
DELETE requests against a particular URI. If you send the same OPTIONS
request but omit the
header, you may find that you’re only allowed to make GET and HEAD
requests. The OPTIONS method lets the client do simple access control
In theory, the server can send additional information in
response to an OPTIONS request, and the client can send OPTIONS
requests that ask very specific questions about the server’s
capabilities. Very nice, except there are no accepted standards for
what a client might ask in an OPTIONS request. Apart from the
Allow header there are no accepted standards
for what a server might send in response. Most web servers and
frameworks feature very poor support for OPTIONS. So far, OPTIONS is a
promising idea that nobody uses.
Now we come to that most misunderstood of HTTP methods: POST. This method essentially has two purposes: one that fits in with the constraints of REST, and one that goes outside REST and introduces an element of the RPC style. In complex cases like this it’s best to go back to the original text. Here’s what RFC 2616, the HTTP standard, says about POST (this is from section 9.5):
POST is designed to allow a uniform method to cover the following functions:
Annotation of existing resources;
Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles;
Providing a block of data, such as the result of submitting a form, to a data-handling process;
Extending a database through an append operation.
The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI. The posted entity is subordinate to that URI in the same way that a file is subordinate to a directory containing it, a news article is subordinate to a newsgroup to which it is posted, or a record is subordinate to a database.
What does this mean in the context of REST and the ROA?
In a RESTful design, POST is commonly used to create subordinate resources: resources that exist in
relation to some other “parent” resource. A weblog program may
expose each weblog as a resource (
/weblogs/myweblog), and the individual
weblog entries as subordinate resources (
web-enabled database may expose a table as a resource, and the
individual database rows as its subordinate resources. To create a
weblog entry or a database record, you POST to the parent: the
weblog or the database table. What data you post, and what format
it’s in, depends on the service, but as with PUT, this is the point
where application state becomes resource state. You may see this use
of POST called POST(a), for “append”. When I
say “POST” in this book, I almost always mean POST(a).
Why can’t you just use PUT to create subordinate resources? Well, sometimes you can. An S3 object is a subordinate resource: every S3 object is contained in some S3 bucket. But we don’t create an S3 object by sending a POST request to the bucket. We send a PUT request directly to the URI of the object. The difference between PUT and POST is this: the client uses PUT when it’s in charge of deciding which URI the new resource should have. The client uses POST when the server is in charge of deciding which URI the new resource should have.
The S3 service expects clients to create S3 objects with PUT, because an S3 object’s URI is completely determined by its name and the name of the bucket. If the client knows enough to create the object, it knows what its URI will be. The obvious URI to use as the target of the PUT request is the one the bucket will live at once it exists.
But consider an application in which the server has more
control over the URIs: say, a weblog program. The client can gather
all the information necessary to create a weblog entry, and still
not know what URI the entry will have once created. Maybe the server
bases the URIs on ordering or an internal database ID: will the
final URI be
/weblogs/myweblog/entries/1000? Maybe the
final URI is based on the posting time: what time does the server
think it is? The client shouldn’t have to know these things.
The POST method is a way of creating a new resource without the client having to know its exact URI. In most cases the client only needs to know the URI of a “parent” or “factory” resource. The server takes the representation from the entity-body and uses it to create a new resource “underneath” the “parent” resource (the meaning of “underneath” depends on context).
The response to this sort of POST request usually has an HTTP
status code of 201 (“Created”). Its
Location header contains the URI of the
newly created subordinate resource. Now that the resource actually
exists and the client knows its URI, future requests can use the PUT
method to modify that resource, GET to fetch a representation of it,
and DELETE to delete it.
Table 4-1 shows how a PUT request to a URI might create or modify the underlying resource; and how a POST request to the same URI might create a new, subordinate resource.
Table 4-1. PUT actions
|PUT to a new resource||PUT to an existing resource||POST|
|N/A (resource already exists)||No effect||Create a new weblog|
|Create this weblog||Modify this weblog’s settings||Create a new weblog entry|
|N/A (how would you get this URI?)||Edit this weblog entry||Post a comment to this weblog entry|
The information conveyed in a POST to a resource doesn’t have to result in a whole new subordinate resource. Sometimes when you POST data to a resource, it appends the information you POSTed to its own state, instead of putting it in a new resource.
Consider an event logging service that exposes a single
resource: the log. Say its URI is
/log. To get the log you send a GET
Now, how should a client append to the log? The client might
send a PUT request to
the PUT method has the implication of creating a new resource, or
overwriting old settings with
new ones. The client isn’t doing either: it’s just appending
information to the end of the log.
The POST method works here, just as it would if each log entry was exposed as a separate resource. The semantics of POST are the same in both cases: the client adds subordinate information to an existing resource. The only difference is that in the case of the weblog and weblog entries, the subordinate information showed up as a new resource. Here, the subordinate information shows up as new data in the parent’s representation.
That way of looking at things explains most of what the HTTP standard says about POST. You can use it to create resources underneath a parent resource, and you can use it to append extra data onto the current state of a resource. The one use of POST I haven’t explained is the one you’re probably most familiar with, because it’s the one that drives almost all web applications: providing a block of data, such as the result of submitting a form, to a data-handling process.
What’s a “data-handling process”? That sounds pretty vague. And, indeed, just about anything can be a data-handling process. Using POST this way turns a resource into a tiny message processor that acts like an XML-RPC server. The resource accepts a POST request, examines the request, and decides to do... something. Then it decides to serve to the client... some data.
I call this use of POST overloaded POST, by analogy to operator overloading in a programming language. It’s overloaded because a single HTTP method is being used to signify any number of non-HTTP methods. It’s confusing for the same reason operator overloading can be confusing: you thought you knew what HTTP POST did, but now it’s being used to achieve some unknown purpose. You might see overloaded POST called POST(p), for “process.”
When your service exposes overloaded POST, you reopen the question: “why should the server do this instead of that?” Every HTTP request has to contain method information, and when you use overloaded POST it can’t go into the HTTP method. The POST method is just a directive to the server, saying: “Look inside the HTTP request for the real method information.” The real information may be in the URI, the HTTP headers, or the entity-body. However it happens, an element of the RPC style has crept into the service.
When the method information isn’t found in the HTTP method, the interface stops being uniform. The real method information might be anything. As a REST partisan I don’t like this very much, but occasionally it’s unavoidable. By Chapter 9 you’ll have seen how just about any scenario you can think of can be exposed through HTTP’s uniform interface, but sometimes the RPC style is the easiest way to express complex operations that span multiple resources.
You may need to expose overloaded POST even if you’re only using POST to create subordinate resources or to append to a resource’s representation. What if a single resource supports both kinds of POST? How does the server know whether a client is POSTing to create a subordinate resource, or to append to the existing resource’s representation? You may need to put some additional method information elsewhere in the HTTP request.
Overloaded POST should not be used to cover up poor resource design. Remember, a resource can be anything. It’s usually possible to shuffle your resource design so that the uniform interface applies, rather than introduce the RPC style into your service.
If you expose HTTP’s uniform interface as it was designed, you get two useful properties for free. When correctly used, GET and HEAD requests are safe. GET, HEAD, PUT and DELETE requests are idempotent.
A GET or HEAD request is a request to read some data, not a request to change any server state. The client can make a GET or HEAD request 10 times and it’s the same as making it once, or never making it at all. When you GET http://www.google.com/search?q=jellyfish, you aren’t changing anything about the directory of jellyfish resources. You’re just retrieving a representation of it. A client should be able to send a GET or HEAD request to an unknown URI and feel safe that nothing disastrous will happen.
This is not to say that GET and HEAD requests can’t have side effects. Some resources are hit counters that increment every time a client GETs them. Most web servers log every incoming request to a log file. These are side effects: the server state, and even the resource state, is changing in response to a GET request. But the client didn’t ask for the side effects, and it’s not responsible for them. A client should never make a GET or HEAD request just for the side effects, and the side effects should never be so big that the client might wish it hadn’t made the request.
Idempotence is a slightly tricker notion. The idea comes from math, and if you’re not familiar with idempotence, a math example might help. An idempotent operation in math is one that has the same effect whether you apply it once, or more than once. Multiplying a number by zero is idempotent: 4 ×0 ×0 ×0 is the same as 4 ×0.By analogy, an operation on a resource is idempotent if making one request is the same as making a series of identical requests. The second and subsequent requests leave the resource state in exactly the same state as the first request did.
PUT and DELETE operations are idempotent. If you DELETE a resource, it’s gone. If you DELETE it again, it’s still gone. If you create a new resource with PUT, and then resend the PUT request, the resource is still there and it has the same properties you gave it when you created it. If you use PUT to change the state of a resource, you can resend the PUT request and the resource state won’t change again.
The practical upshot of this is that you shouldn’t allow your clients to PUT representations that change a resource’s state in relative terms. If a resource keeps a numeric value as part of its resource state, a client might use PUT to set that value to 4, or 0, or −50, but not to increment that value by 1. If the initial value is 0, sending two PUT requests that say “set the value to 4” leaves the value at 4. If the initial value is 0, sending two PUT requests that say “increment the value by 1” leaves the value not at 1, but at 2. That’s not idempotent.
Safety and idempotence let a client make reliable HTTP requests over an unreliable network. If you make a GET request and never get a response, just make another one. It’s safe: even if your earlier request went through, it didn’t have any real effect on the server. If you make a PUT request and never get a response, just make another one. If your earlier request got through, your second request will have no additional effect.
POST is neither safe nor idempotent. Making two identical POST requests to a “factory” resource will probably result in two subordinate resources containing the same information. With overloaded POST, all bets are off.
The most common misuse of the uniform interface is to expose unsafe operations through GET. The del.icio.us and Flickr APIs both do this. When you GET https://api.del.icio.us/posts/delete, you’re not fetching a representation: you’re modifying the del.icio.us data set.
Why is this bad? Well, here’s a story. In 2005 Google released a client-side caching tool called Web Accelerator. It runs in conjunction with your web browser and “pre-fetches” the pages linked to from whatever page you’re viewing. If you happen to click one of those links, the page on the other side will load faster, because your computer has already fetched it.
Web Accelerator was a disaster. Not because of any problem in the software itself, but because the Web is full of applications that misuse GET. Web Accelerator assumed that GET operations were safe, that clients could make them ahead of time just in case a human being wanted to see the corresponding representations. But when it made those GET requests to real URIs, it changed the data sets. People lost data.
There’s plenty of blame to go around: programmers shouldn’t expose unsafe actions through GET, and Google shouldn’t have released a real-world tool that didn’t work with the real-world web. The current version of Web Accelerator ignores all URIs that contain query variables. This solves part of the problem, but it also prevents many resources that are safe to use through GET (such as Google web searches) from being pre-fetched.
Multiply the examples if you like. Many web services and web applications use URIs as input, and the first thing they do is send a GET request to fetch a representation of a resource. These services don’t mean to trigger catastrophic side effects, but it’s not up to them. It’s up to the service to handle a GET request in a way that complies with the HTTP standard.
The important thing about REST is not that you use the specific uniform interface that HTTP defines. REST specifies a uniform interface, but it doesn’t say which uniform interface. GET, PUT, and the rest are not a perfect interface for all time. What’s important is the uniformity: that every service use HTTP’s interface the same way.
The point is not that GET is the best name for a read operation, but that GET means “read” across the Web, no matter which resource you’re using it on. Given a URI of a resource, there’s no question of how you get a representation: you send an HTTP GET request to that URI. The uniform interface makes any two services as similar as any two web sites. Without the uniform interface, you’d have to learn how each service expected to receive and send information. The rules might even be different for different resources within a single service.
You can program a computer to understand what GET means, and
that understanding will apply to every RESTful web service. There’s
not much to understand. The service-specific code can live in the
handling of the representation. Without the uniform interface, you get
a multiplicity of methods taking the place of GET:
nextPrime. Every service speaks a different
language. This is also the reason I don’t like overloaded POST very
much: it turns the simple Esperanto of the uniform interface into a
Babel of one-off sublanguages.
Some applications extend HTTP’s uniform interface. The most obvious case is WebDAV, which adds eight new HTTP methods including MOVE, COPY, and SEARCH. Using these methods in a web service would not violate any precept of REST, because REST doesn’t say what the uniform interface should look like. Using them would violate my Resource-Oriented Architecture (I’ve explicitly tied the ROA to the standard HTTP methods), but your service could still be resource-oriented in a general sense.
The real reason not to use the WebDAV methods is that doing so makes your service incompatible with other RESTful services. Your service would use a different uniform interface than most other services. There are web services like Subversion that use the WebDAV methods, so your service wouldn’t be all alone. But it would be part of a much smaller web. This is why making up your own HTTP methods is a very, very bad idea: your custom vocabulary puts you in a community of one. You might as well be using XML-RPC.
Another uniform interface consists solely of HTTP GET and overloaded POST. To fetch a representation of a resource, you send GET to its URI. To create, modify, or delete a resource, you send POST. This interface is perfectly RESTful, but, again, it doesn’t conform to my Resource-Oriented Architecture. This interface is just rich enough to distinguish between safe and unsafe operations. A resource-oriented web application would use this interface, because today’s HTML forms only support GET and POST.
 Multiplying a number by one is both safe and idempotent: 4 ×1 ×1 ×1 is the same as 4 ×1, which is the same as 4. Multiplication by zero is not safe, because 4 ×0 is not the same as 4. Multiplication by any other number is neither safe nor idempotent.