What makes a resource a resource? It has to have at least one URI. The URI is the name and address of a resource. If a piece of information doesn’t have a URI, it’s not a resource and it’s not really on the Web, except as a bit of data describing some other resource.
Remember the sample session in the Preface,
when I was making fun of HTTP 0.9? Let’s say this is a HTTP 0.9 request
|Client request||Server response|
An HTTP client manipulates a resource by connecting to the server
that hosts it (in this case,
www.example.com), and sending the server a
method (“GET”) and a path to the resource (“/hello.txt”). Today’s HTTP
1.1 is a little more complex than 0.9, but it works the same way. Both
the server and the path come from the resource’s URI.
|Client request||Server response|
GET /hello.txt HTTP/1.1 Host: www.example.com
200 OK Content-Type: text/plain Hello, world!
The principles behind URIs are well described by Tim Berners-Lee in Universal Resource Identifiers—Axioms of Web Architecture. In this section I expound the principles behind constructing URIs and assigning them to resources.
The URI is the fundamental technology of the Web. There were hypertext systems before HTML, and Internet protocols before HTTP, but they didn’t talk to each other. The URI interconnected all these Internet protocols into a Web, the way TCP/IP interconnected networks like Usenet, Bitnet, and CompuServe into a single Internet. Then the Web co-opted those other protocols and killed them off, just like the Internet did with private networks.
Today we surf the Web (not Gopher), download files from the Web (not FTP sites), search publications from the Web (not WAIS), and have conversations on the Web (not Usenet newsgroups). Version control systems like Subversion and arch work over the Web, as opposed to the custom CVS protocol. Even email is slowly moving onto the Web.
The web kills off other protocols because it has something most protocols lack: a simple way of labeling every available item. Every resource on the Web has at least one URI. You can stick a URI on a billboard. People can see that billboard, type that URI into their web browsers, and go right to the resource you wanted to show them. It may seem strange, but this everyday interaction was impossible before URIs were invented.
Here’s the first point where the ROA builds upon the sparse recommendations of the REST thesis and the W3C recommendations. I propose that a resource and its URI ought to have an intuitive correspondence. Here are some good URIs for the resources I listed above:
URIs should have a structure. They should vary in predictable
ways: you should not go to
/search/Jellyfish for jellyfish and
/i-want-to-know-about/Mice for mice. If a
client knows the structure of the service’s URIs, it can create its
own entry points into the service. This makes it easy for clients to
use your service in ways you didn’t think of.
This is not an absolute rule of REST, as we’ll see in the Name the Resources” section of Chapter 5. URIs do not technically have to have any structure or predictability, but I think they should. This is one of the rules of good web design, and it shows up in RESTful and REST-RPC hybrid services alike.
Let’s consider some edge cases. Can two resources be the same? Can two URIs designate the same resource? Can a single URI designate two resources?
By definition, no two resources can be the same. If they were
the same, you’d only have one resource. However, at some moment in
time two different resources may point to the same data. If the
current software release is 1.0.3, then
will refer to the same file for a while. But the ideas
behind those two URIs are different: one of them always points to a
particular version, and the other points to whatever version is newest
at the time the client accesses it. That’s two concepts and two
resources. You wouldn’t link to
latest when reporting a bug in version
A resource may have one URI or many. The sales numbers available at http://www.example.com/sales/2004/Q4 might also be available at http://www.example.com/sales/Q42004. If a resource has multiple URIs, it’s easier for clients to refer to the resource. The downside is that each additional URI dilutes the value of all the others. Some clients use one URI, some use another, and there’s no automatic way to verify that all the URIs refer to the same resource.
One way to get around this is to expose multiple URIs for the same resource, but have one of them be the “canonical” URI for that resource. When a client requests the canonical URI, the server sends the appropriate data along with response code of 200 (“OK”). When a client requests one of the other URIs, the server sends a response code 303 (“See Also”) along with the canonical URI. The client can’t see whether two URIs point to the same resource, but it can make two HEAD requests and see if one URI redirects to the other or if they both redirect to a third URI.
Another way is to serve all the URIs as though they were the
same, but give the “canonical” URI in the
Content-Location response header
whenever someone requests a non-canonical URI.
get you the same bytestream as fetching
sales/Q42004, because they’re different URIs
for the same resource: “sales for the last quarter of 2004.” Fetching
releases/1.0.3.tar.gz might give
you the exact same bytestream as fetching
releases/latest.tar.gz, but they’re
different resources because they represent different things: “version
1.0.3” and “the latest version.”
Every URI designates exactly one resource. If it designated more than one, it wouldn’t be a Universal Resource Identifier. However, when you fetch a URI the server may send you information about multiple resources: the one you requested and other, related ones. When you fetch a web page, it usually conveys some information of its own, but it also has links to other web pages. When you retrieve an S3 bucket with an Amazon S3 client, you get a document that contains information about the bucket, and information about related resources: the objects in the bucket.