Cover by Subbu Allamaraju

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

Chapter 4. Designing URIs

URIs are identifiers of resources that work across the Web. A URI consists of a scheme (such as http and https), a host (such as www.example.org), a port number followed by a path with one or more segments (such as /users/1234), and a query string. In this chapter, our focus is on designing URIs for RESTful web services:

Recipe 4.1

Use this recipe to learn some commonly practiced URI design conventions.

Recipe 4.2

Use this recipe to learn some dos and don’ts to keep URIs as opaque identifiers.

Recipe 4.3

Treating URIs as opaque identifiers helps decouple clients from servers. This recipe shows techniques that the server can employ to help clients treat URIs as opaque.

Recipe 4.4

Since URIs are a key part of the interface between clients and servers, it is important to keep them cool, i.e., stable and permanent. Use this recipe to learn some practices to help keep URIs cool.

4.1. How to Design URIs

URIs are opaque resource identifiers. In most cases, clients need not be concerned with how a server designs its URIs. However, following common conventions when designing URIs has several advantages:

  • URIs that support convention are usually easy to debug and manage.

  • Servers can centralize code to extract data from request URIs.

  • You can avoid spending valuable design and implementation time inventing new conventions and rules for processing URIs.

  • Partitioning the server’s URIs across domains, subdomains, and paths gives you operational flexibility for load distribution, monitoring, routing, and security.

Problem

You want to know the best practices to design URIs for resources.

Solution

  • Use domains and subdomains to logically group or partition resources for localization, distribution, or to enforce various monitoring or security policies.

  • Use the forward-slash separator (/) in the path portion of the URI to indicate a hierarchical relationship between resources.

  • Use the comma (,) and semicolon (;) to indicate nonhierarchical elements in the path portion of the URI.

  • Use the hyphen (-) and underscore (_) characters to improve the readability of names in long path segments.

  • Use the ampersand (&) to separate parameters in the query portion of the URI.

  • Avoid including file extensions (such as .php, .aspx, and .jsp) in URIs.

Discussion

URI design is just one aspect of implementing RESTful applications. Here are some conventions to consider when designing URIs.

Warning

As important as URI design is to the success of your web service, it is just as important to keep the time spent in URI design to a minimum. Focus on consistency of URIs instead.

Domains and subdomains

A logical partition of URIs into domains and subdomains provides several operational benefits for server administration. Make sure to use logical names for subdomains while partitioning URIs. For example, the server could offer localized representations via different subdomains, as in the following:

http://en.example.org/book/1234
http://da.example.org/book/1234
http://fr.example.org/book/1234 

Another example is, partition based on the class of clients.

http://www.example.org/book/1234
http://api.example.org/book/1234 

In this example, the server offers two subdomains, one for browsers and the other for custom clients. Such partitioning may let the server allocate different hardware or apply different routing, monitoring, or security policies for HTML and non-HTML representations.

Forward-slash separator

By convention, the forward slash (/) character is used to convey hierarchical relationships. This is not a hard and fast rule, but most users assume this when they scan URIs. In fact, the forward slash is the only character mentioned in RFC 3986 as typically indicating a hierarchical relationship. For example, all the following URIs convey a hierarchical association between path segments:

http://www.example.org/messages/msg123
http://www.example.org/customer/orders/order1
http://www.example.org/earth/north-america/canada/manitoba 

Some web services may use a trailing forward slash for collection resources. Use such conventions with care since some development frameworks may incorrectly remove such slashes or add trailing slashes during URI normalization.

Underscore and hyphen

If you want to make your URIs easy for humans to scan and interpret, use the underscore (_) or hyphen (-) character:

http://www.example.org/blog/this-is-my-first-post
http://www.example.org/my_photos/our_summer_vacation/first_day/setting_up_camp/ 

There is no reason to favor one over the other. For the sake of consistency, pick one and use it consistently.

Ampersand

Use the ampersand character (&) to separate parameters in the query portion of the URI:

http://www.example.org/print?draftmode&landscape
http://www.example.org/search?word=Antarctica&limit=30 

In the first URI shown, the parameters are draftmode and landscape. The second URI has the parameters word=Antarctica and limit=30.

Comma and semicolon

Use the comma (,) and semi-colon (;) characters to indicate nonhierarchical portions of the URI. The semicolon convention is used to identify matrix parameters:

http://www.example.org/co-ordinates;w=39.001409,z=-84.578201
http://www.example.org/axis;x=0,y=9 

These characters are valid in the path and query portions of URIs, but not all code libraries recognize the comma and semicolon as separators and may require custom coding to extract these parameters.

Full stop, or period

Apart from its use in domain names, the full stop (.), or period, is used to separate the document and file extension portions of the URI:

http://www.example.org/my-photos/flowers.png
http://www.example.org/index.html
http://www.example.org/api/recent-messages.xml
http://www.example.org/blog/this.is.my.next.post.html 

The last example in the previous list is valid but might introduce confusion. Since some code libraries use the period to signal the start of the file extension portion of the URI path, URIs with multiple periods can return unexpected results or might cause a parsing error.

Except for legacy reasons, there is no reason to use this character in URIs. Clients should use the media type of the representation to learn how to process the representation. Sniffing the media type from extensions can lead to security vulnerabilities. For instance, various versions of Internet Explorer are prone to security vulnerabilities because of its implementation of media type sniffing (http://msdn.microsoft.com/en-us/library/ms775148(VS.85).aspx).

Implementation-specific file extensions

Consider the following URIs:

http://www.example.org/report-summary.xml
http://www.example.org/report-summary.jsp
http://www.example.org/report-summary.aspx 

In all three cases, the data is the same and the representation format may be the same, but the file extension indicates the technology used to generate the resource representation. These URIs will need to change if the technology used needs to change.

Spaces and capital letters

Spaces are valid URI characters, and according to RFC 3986, the space character should be percent-encoded to %20. However, the application/x-www-form-urlencoded media type (used by HTML form elements) encodes the space character as the plus sign (+). Consider the following HTML:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
  <head>
    <title>Search</title>
  </head>
  <body>
    <form method="GET" action="http://www.example.org/search"
      enc-type="application/x-www-form-urlencoded">
      <label for="phrase">Enter a search phrase</label>
      <input type="text" name="phrase" value=""/>
      <input type="submit" value="Search"/>
    </form>
  </body>
</html> 

When a user submits the search phrase “Hadron Supercollider,” the resulting URI (using application/x-www-form-urlencoded rules) would be as follows:

http://www.example.org/search?phrase=Hadron+Supercollider 

Code that is not aware of how the URI was generated will interpret the URI using RFC 3986 and treat the value of the search phrase as “Hadron+Supercollider.”

This inconsistency can cause encoding errors for web services that are not prepared to accept URIs encoded using the application/x-www-form-urlencoded media type. This is not just a problem with common web browsers. Some code libraries also apply these rules inconsistently.

Capital letters in URIs may also cause problems. RFC 3986 defines URIs as case sensitive except for the scheme and host parts. For example, although http://www.example.org/my-folder/doc.txt and HTTP://WWW.EXAMPLE.ORG/my-folder/doc.txt are the same, but http://www.example.org/My-Folder/doc.txt isn’t. However, Windows-based web servers treat these URIs as the same when the resource is served from the filesystem. This case insensitivity does not apply to characters in the query portion. For these reasons, avoid using uppercase characters in URIs.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required