URIs are identifiers of resources that work across the Web.
A URI consists of a scheme (such as
https), a host (such as
www.example.org), a port
number followed by a path with one or more segments (such as
/users/1234), and a query string. In this chapter, our focus
is on designing URIs for RESTful web services:
Use this recipe to learn some commonly practiced URI design conventions.
Use this recipe to learn some dos and don’ts to keep URIs as opaque identifiers.
Treating URIs as opaque identifiers helps decouple clients from servers. This recipe shows techniques that the server can employ to help clients treat URIs as opaque.
Since URIs are a key part of the interface between clients and servers, it is important to keep them “cool,” i.e., stable and permanent. Use this recipe to learn some practices to help keep URIs cool.
URIs are opaque resource identifiers. In most cases, clients need not be concerned with how a server designs its URIs. However, following common conventions when designing URIs has several advantages:
URIs that support convention are usually easy to debug and manage.
Servers can centralize code to extract data from request URIs.
You can avoid spending valuable design and implementation time inventing new conventions and rules for processing URIs.
Partitioning the server’s URIs across domains, subdomains, and paths gives you operational flexibility for load distribution, monitoring, routing, and security.
Use domains and subdomains to logically group or partition resources for localization, distribution, or to enforce various monitoring or security policies.
Avoid including file extensions (such as .php, .aspx, and .jsp) in URIs.
URI design is just one aspect of implementing RESTful applications. Here are some conventions to consider when designing URIs.
As important as URI design is to the success of your web service, it is just as important to keep the time spent in URI design to a minimum. Focus on consistency of URIs instead.
A logical partition of URIs into domains and subdomains provides several operational benefits for server administration. Make sure to use logical names for subdomains while partitioning URIs. For example, the server could offer localized representations via different subdomains, as in the following:
http://en.example.org/book/1234 http://da.example.org/book/1234 http://fr.example.org/book/1234
Another example is, partition based on the class of clients.
In this example, the server offers two subdomains, one for browsers and the other for custom clients. Such partitioning may let the server allocate different hardware or apply different routing, monitoring, or security policies for HTML and non-HTML representations.
By convention, the forward slash (
/) character is
used to convey hierarchical relationships. This is not a hard and
fast rule, but most users assume this when they scan URIs. In fact,
the forward slash is the only character mentioned in RFC 3986 as
typically indicating a hierarchical relationship. For example, all
the following URIs convey a hierarchical association between path
http://www.example.org/messages/msg123 http://www.example.org/customer/orders/order1 http://www.example.org/earth/north-america/canada/manitoba
Some web services may use a trailing forward slash for collection resources. Use such conventions with care since some development frameworks may incorrectly remove such slashes or add trailing slashes during URI normalization.
If you want to make your URIs easy for humans to scan and interpret, use the underscore (_) or hyphen (-) character:
There is no reason to favor one over the other. For the sake of consistency, pick one and use it consistently.
Use the ampersand character (
&) to separate
parameters in the query portion of the URI:
In the first URI shown, the parameters are
landscape. The second URI
has the parameters
Use the comma (
,) and semi-colon (
characters to indicate nonhierarchical portions of the URI. The
semicolon convention is used to identify matrix parameters:
These characters are valid in the path and query portions of URIs, but not all code libraries recognize the comma and semicolon as separators and may require custom coding to extract these parameters.
http://www.example.org/my-photos/flowers.png http://www.example.org/index.html http://www.example.org/api/recent-messages.xml http://www.example.org/blog/this.is.my.next.post.html
The last example in the previous list is valid but might introduce confusion. Since some code libraries use the period to signal the start of the file extension portion of the URI path, URIs with multiple periods can return unexpected results or might cause a parsing error.
Except for legacy reasons, there is no reason to use this character in URIs. Clients should use the media type of the representation to learn how to process the representation. “Sniffing” the media type from extensions can lead to security vulnerabilities. For instance, various versions of Internet Explorer are prone to security vulnerabilities because of its implementation of media type sniffing (http://msdn.microsoft.com/en-us/library/ms775148(VS.85).aspx).
Consider the following URIs:
http://www.example.org/report-summary.xml http://www.example.org/report-summary.jsp http://www.example.org/report-summary.aspx
In all three cases, the data is the same and the representation format may be the same, but the file extension indicates the technology used to generate the resource representation. These URIs will need to change if the technology used needs to change.
Spaces are valid URI characters, and according to RFC
3986, the space character should be percent-encoded to
%20. However, the
application/x-www-form-urlencoded media type (used by
form elements) encodes the space character as the
plus sign (
+). Consider the following HTML:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html lang="en"> <head> <title>Search</title> </head> <body> <form method="GET" action="http://www.example.org/search" enc-type="application/x-www-form-urlencoded"> <label for="phrase">Enter a search phrase</label> <input type="text" name="phrase" value=""/> <input type="submit" value="Search"/> </form> </body> </html>
When a user submits the search phrase “Hadron Supercollider,”
the resulting URI (using
application/x-www-form-urlencoded rules) would be as
Code that is not aware of how the URI was generated will interpret the URI using RFC 3986 and treat the value of the search phrase as “Hadron+Supercollider.”
This inconsistency can cause encoding errors for web services
that are not prepared to accept URIs encoded using the
application/x-www-form-urlencoded media type. This is
not just a problem with common web browsers. Some code libraries
also apply these rules inconsistently.
Capital letters in URIs may also cause problems. RFC 3986 defines URIs
as case sensitive except for the scheme and host parts. For example,
HTTP://WWW.EXAMPLE.ORG/my-folder/doc.txt are the same,
However, Windows-based web servers treat these URIs as the same when
the resource is served from the filesystem. This case insensitivity
does not apply to characters in the query portion. For these
reasons, avoid using uppercase characters in URIs.