The World Wide Web became popular because ordinary people can use it to do really useful things with minimal training. But behind the scenes, the Web is also a powerful platform for distributed computing.
The principles that make the Web usable by ordinary people also work when the “user” is an automated software agent. A piece of software designed to transfer money between bank accounts (or carry out any other real-world task) can accomplish the task using the same basic technologies a human being would use.
As far as this book is concerned, the Web is based on three technologies: the URL naming convention, the HTTP protocol, and the HTML document format. URL and HTTP are simple, but to apply them to distributed programming you must understand them in more detail than the average web developer does. The first few chapters of this book are dedicated to giving you this understanding.
The story of HTML is a little more complicated. In the world of web APIs, there are dozens of data formats competing to take the place of HTML. An exploration of these formats will take up several chapters of this book, starting in Chapter 5. For now, I want to focus on URL and HTTP, and use HTML solely as an example.
I’m going to start off by telling a simple story about the World Wide Web, as a way of explaining the principles behind its design and the reasons for its success. The story needs to be simple because although you’re certainly familiar with the Web, you might not have heard of the concepts that make it work. I want you to have a simple, concrete example to fall back on if you ever get confused about terminology like “hypermedia as the engine of application state.”
Let’s get started.
One day Alice is walking around town and she sees a billboard (Figure 1-1).
(By the way, this fictional billboard advertises a real website that I designed for this book. You can try it out yourself.)
Alice is old enough to remember the mid-1990s, so she recalls the public’s reaction when URLs started showing up on billboards. At first, people made fun of these weird-looking strings. It wasn’t clear what “http://” or “youtypeitwepostit.com” meant. But 20 years later, everyone knows what to do with a URL: you type it into the address bar of your web browser and hit Enter.
And that’s what Alice does: she pulls out her mobile phone and puts http://www.youtypeitwepostit.com/ in her browser’s address bar. The first episode of our story ends on a cliffhanger: what’s at the other end of that URL?
Sorry for interrupting the story, but I need to introduce some basic terminology. Alice’s web browser is about to send an HTTP request to a web server—specifically, to the URL http://www.youtypeitwepostit.com/. One web server may host many different URLs, and each URL grants access to a different bit of the data on the server.
The URL http://www.youtypeitwepostit.com/ identifies a resource—probably the home page of the website advertised on the billboard. But you won’t know for sure until we resume the story and Alice’s web browser sends the HTTP request.
When a web browser sends an HTTP request for a resource, the server sends a document in response (usually an HTML document, but sometimes a binary image or something else). Whatever document the server sends, we call that document a representation of the resource.
So each URL identifies a resource. When a client makes an HTTP request to a URL, it gets a representation of the underlying resource. The client never sees a resource directly.
I’ll talk a lot more about resources and representations in Chapter 3. Right now I just want to use the terms resource and representation to discuss the principle of addressability, to which I’ll now turn.
A URL identifies one and only one resource. If a website has two conceptually different things on it, we expect the site to treat them as two resources with different URLs. We get frustrated when a website violates this rule. Websites for restaurants are especially bad about this. Frequently, the whole site is buried inside a Flash interface and there’s no URL that points to the menu or to the map that shows where the restaurant is located—things we would like to talk about on their own.
The principle of addressability just says that every resource should have its own URL. If something is important to your application, it should have a unique name, a URL, so that you and your users can refer to it unambiguously.
Back to our story. When Alice enters the URL from the billboard into her browser’s address bar, it sends an HTTP request over the Internet to the web server at http://www.youtypeitwepostit.com/:
GET / HTTP/1.1 Host: www.youtypeitwepostit.com
The web server handles this request (neither Alice nor her web browser need to know how) and sends a response:
HTTP/1.1 200 OK Content-type: text/html <!DOCTYPE html> <html> <head> <title>Home</title> </head> <body> <div> <h1>You type it, we post it!</h1> <p>Exciting! Amazing!</p> <p class="links"> <a href="/messages">Get started</a> <a href="/about">About this site</a> </p> </div> </body> </html>
The 200 at the beginning of the response is a status code, also
called a response code. It’s a quick way for the server to tell the
client approximately what happened to the client’s request. There are
a lot of HTTP status codes, and I cover them all in Appendix A, but
the most common one is the one you see here. 200 (
OK) means that the
request was fulfilled with no problems.
Alice’s web browser decodes the response as an HTML document and displays it graphically (see Figure 1-2).
Now Alice can read the web page and understand what the billboard was talking about. It was advertising a microblogging site, similar to Twitter. Not as exciting as advertised on the billboard, but good enough as an example.
Alice’s first real interaction with the web server reveals a couple more important features of the Web.
At this point in the story, Alice’s web browser is displaying the site’s home page. From her perspective, she’s “landed” on that page, which is is her current “location” in cyberspace. But as far as the server is concerned, Alice isn’t anywhere. The server has already forgotten about her.
HTTP sessions last for one request. The client sends a request, and the server responds. This means Alice could turn her phone off overnight, and when her browser restored the page from its internal cache, she could click on one of the two links on this page and it would still work. (Compare this to an SSH session, which is terminated if you turn your computer off.)
Alice could leave this web page open in her phone for six months, and when she finally clicks on a link, the web server would respond as if she’d only waited a few seconds. The web server isn’t sitting up late at night worrying about Alice. When she’s not making an HTTP request, the server doesn’t know Alice exists.
This principle is sometimes called statelessness. I think this is a confusing term because the client and the server in this system both keep state; they just keep different kinds of state. The term “statelessness” is getting at the fact that the server doesn’t care what state the client is in. (I’ll talk more about the different kinds of state in the following sections.)
It’s clear from looking at the HTML that this site is more than just a home page. The markup for the home page contains two links: one to the relative URL /about (i.e., to http://www.youtypeitwepostit.com/about) and one to /messages (i.e., http://www.youtypeitwepostit.com/messages). At first Alice only knew one URL—the URL to the home page—but now she knows three. The server is slowly revealing its structure to her.
We can draw a map of the website so far (Figure 1-3), as revealed to Alice by the server.
What’s on the other end of the /messages and /about links? The only way to be sure is to follow them and find out. But Alice can look at the HTML markup, or her browser’s graphical rendering of the markup, and make an educated guess. The link with the text “About this site” probably goes to a page talking about the site. That’s nice, but the link with the text “Get started” is probably the one that gets her closer to actually posting a message.
When you request a web page, the HTML document you receive doesn’t just give you the immediate information you asked for. The document also helps you answer the question of what to do next.
After reading the home page, Alice decides to give this site a try. She clicks the link that says “Get started.” Of course, whenever you click a link in your web browser, you’re telling your web browser to make an HTTP request.
The code for the link Alice clicked on looks like this:
<a href="/messages">Get started</a>
So her browser makes this HTTP request to the same server as before:
GET /messages HTTP/1.1 Host: www.youtypeitwepostit.com
That GET in the request is an HTTP method, also known as an HTTP verb. The HTTP method is the client’s way of telling the server what it wants to do to a resource. “GET” is the most common HTTP method. It means “give me a representation of this resource.” For a web browser, GET is the default. When you follow a link or type a URL into the address bar, your browser sends a GET request.
The server handles this particular GET request by sending a representation of /messages:
HTTP/1.1 200 OK Content-type: text/html ... <!DOCTYPE html> <html> <head> <title>Messages</title> </head> <body> <div> <h1>Messages</h1> <p> Enter your message below: </p> <form action="http://youtypeitwepostit.com/messages" method="post"> <input type="text" name="message" value="" required="true" maxlength="6"/> <input type="submit" value="Post" /> </form> <div> <p> Here are some other messages, too: </p> <ul> <li><a href="/messages/32740753167308867">Later</a></li> <li><a href="/messages/7534227794967592">Hello</a></li> </ul> </div> <p class="links"> <a href="http://youtypeitwepostit.com/">Home</a> </p> </div> </body> </html>
As before, Alice’s browser renders the HTML graphically (Figure 1-4).
When Alice looks at the graphical rendering, she sees that this page is a list of messages other people have published on the site. Right at the top there’s an inviting text box and a Post button.
Now we’ve revealed a little more about how the server works. Figure 1-5 shows an updated map of the site, as seen by Alice’s browser.
<form action="http://youtypeitwepostit.com/messages" method="post"> <input type="text" name="message" value="" required="true" maxlength="6"/> <input type="submit" /> </form>
The HTTP standard (RFC 2616) defines eight methods a client can apply to a resource. In this book, I’ll focus on five of them: GET, HEAD, POST, PUT, and DELETE. In Chapter 3, I’ll cover these methods in detail, along with an extension method, PATCH, designed specifically for use in web APIs. Right now the important thing to keep in mind is that there are a small number of standard methods.
It’s not impossible to come up with a new HTTP method (it happened with PATCH), but it’s a very big deal. This is not like a programming language, where you can name your methods whatever you want. When I built the simple microblogging website for use in this example, I didn’t define new HTTP methods like GETHOMEPAGE and HELLOPLEASESHOWMETHEMESSAGELISTTHANKSBYE. I used GET for both “show the home page” and “show the message list,” because in both cases GET (“give me a representation of this resource”) was the best match between HTTP’s interface and what I wanted to do. I distinguished between the home page and the message list not by defining new methods, but by treating those two documents as separate resources, each with its own URL, each accessible through GET.
Again, Alice’s browser makes an HTTP request:
POST /messages HTTP/1.1 Host: www.youtypeitwepostit.com Content-type: application/x-www-form-urlencoded message=Test&submit=Post
And the server responds with the following:
HTTP/1.1 303 See Other Content-type: text/html Location: http://www.youtypeitwepostit.com/messages/5266722824890167
When Alice’s browser made its two GET requests, the server sent the
HTTP status code 200 (“OK”) and provided an HTML document for Alice’s
browser to render. There’s no HTML document here, but the server did
provide a link to another URL, in the
Location header—and here, the
status code at the beginning of the response is 303 (“See Other”), not
Status code 303 tells Alice’s browser to automatically make a fourth
HTTP request, to the URL given in the
Location header. Without
asking Alice’s permission, her browser does just that:
GET /messages/5266722824890167 HTTP/1.1
This time, the browser responds with 200 (“OK”) and an HTML document:
HTTP/1.1 200 OK Content-type: text/html <!DOCTYPE html> <html> <head> <title>Message</title> </head> <body> <div> <h2>Message</h2> <dl> <dt>ID</dt><dd>2181852539069950</dd> <dt>DATE</dt><dd>2014-03-28T21:51:08Z</dd> <dt>MSG</dt><dd>Test</dd> </dl> <p class="links"> <a href="http://www.youtypeitwepostit.com/">Home</a> </p> </div> </body> </html>
Alice’s browser displays this document graphically (Figure 1-6), and, finally, goes back to waiting for Alice’s input.
I’m sure you’ve encountered HTTP redirects before, but HTTP is full of small features like this, and some may be new to you. There are many ways for the server to tell the client to handle a response differently, and ways for the client to attach conditions or extra features to its request. A big part of API design is the proper use of these features. Chapter 11 covers the features of HTTP that are most important to web APIs, and Appendix A and Appendix B provide supplementary information on this topic.
By looking at the graphical rendering, Alice sees that her message (“Test”) is now a fully fledged post on YouTypeItWePostIt.com. Our story ends here—Alice has accomplished her goal of trying out the microblogging site. But there’s a lot to be learned from these four simple interactions.
Figure 1-7 is a state diagram that shows Alice’s entire adventure from the perspective of her web browser.
When Alice started up the browser on her phone, it didn’t have any particular page loaded. It was an empty slate. Then Alice typed in a URL and a GET request took the browser to the site’s home page. Alice clicked a link, and a second GET request took the browser to the list of messages. She submitted a form, which caused a third request (a POST request). The response to that was an HTTP redirect, which Alice’s browser made automatically. Alice’s browser ended up at a web page describing the message Alice had just created.
Every state in this diagram corresponds to a particular page (or to no page at all) being open in Alice’s browser window. In REST terms, we call this bit of information—which page are you on?—the application state.
When you surf the Web, every transition from one application state to another corresponds to a link you decided to follow or a form you decided to fill out. Not all transitions are available from all states. Alice can’t make her POST request directly from the home page, because the home page doesn’t feature the form that allows her browser to construct the POST request.
Figure 1-8 is a state diagram showing Alice’s adventure from the perspective of the web server.
The server manages two resources: the home page (served from
the message list (served from
/messages). (The server also manages a
resource for each individual message. I’ve omitted those resources
from the diagram for the sake of simplicity.) The state of these
resources is called, simply enough, resource state.
When the story begins, there are two messages in the message list: “Hello” and “Later.” Sending a GET to the home page doesn’t change resource state, since the home page is a static document that never changes. Sending a GET to the message list won’t change the state either.
But when Alice sends a POST to the message list, it puts the server in a new state. Now the message list contains three messages: “Hello,” “Later,” and “Test.” There’s no way back to the old state, but this new state is very similar. As before, sending a GET to the home page or message list won’t change anything. But sending another POST to the message list will add a fourth message to the list.
Because HTTP sessions are so short, the server doesn’t know anything about a client’s application state. The client has no direct control over resource state—all that stuff is kept on the server. And yet, the Web works. It works through REST—representational state transfer.
Application state is kept on the client, but the server can manipulate it by sending representations—HTML documents, in this case—that describe the possible state transitions. Resource state is kept on the server, but the client can manipulate it by sending the server a representation—an HTML form submission, in this case—describing the desired new state.
In the story, Alice made four HTTP requests to YouTypeItWePostIt.com, and she got three HTML documents in return. Although Alice didn’t follow every single link in those documents, we can use those links to build a rough map of the website from the client’s perspective (Figure 1-9).
This is a web of HTML pages. The strands of the web are the HTML
<form> tags, each describing a GET or POST HTTP request
Alice might decide to make. I call this the principle of
connectedness: each web page tells you how to get to the adjoining
The Web as a whole works on the principle of connectedness, which is better known as “hypermedia as the engine of application state,” sometimes abbreviated HATEOAS. I prefer “connectedness” or “the hypermedia constraint,” because “hypermedia as the engine of application state” sounds intimidating. But at this point, you should have no reason to find it intimidating. You know what application state is—it’s which web page a client is on. Hypermedia is the general term for things like HTML links and forms: the techniques a server uses to explain to a client what it can do next.
To say that hypermedia is the engine of application state is to say that we all navigate the Web by filling out forms and following links.
Alice’s story doesn’t seem that exciting. because the World Wide Web has been the dominant Internet application for the past 20 years. But back in the 1990s, this was a very exciting story. If you compare the World Wide Web to its early competitors, you’ll see the difference.
The Gopher protocol (defined in RFC 1436) looks a lot like HTTP, but it lacks addressability. There is no succinct way to identify a specific document in Gopherspace. At least there wasn’t until the World Wide Web took pity on Gopherspace and released the URL standard (first defined in RFC 1738), which provides a gopher:// URL scheme that works just like http://.
FTP, a popular pre-Web protocol for file transfer (defined in RFC 959), also lacks addressability. Until RFC 1738 came along with its ftp:// URL scheme, there simply was no machine-readable way to point to a file on an FTP server. You had to use English prose to explain where the file was. It took the brainpower of a human being just to locate a file on a server. What a waste!
FTP also featured long-lived sessions. A casual user could log on to an FTP server and tie up one of the server’s TCP connections indefinitely. By contrast, even a “persistent” HTTP connection shouldn’t tie up a TCP connection for longer than 30 seconds.
The 1990s saw a lot of Internet protocols for searching different kinds of archives and databases—protocols like Archie, Veronica, Jughead, WAIS, and Prospero. But it turns out we don’t need all those protocols. We just need to be able to send GET requests to different kinds of websites. All these protocols died out or were replaced by websites. Their complex protocol-specific rules were folded into the uniformity of HTTP GET.
Once the Web took over, it became a lot more difficult to justify creating a new application protocol. Why create a new tool that only techies will understand, when you can put up a website that anyone can use? All successful post-Web protocols do something the Web can’t do: peer-to-peer protocols like BitTorrent and real-time protocols like SSH. For most purposes, HTTP is good enough.
The unprecedented flexibility of the Web comes from the principles of REST. In the 1990s, we discovered that the Web works better than its competition. In 2000, Roy T. Fielding’s Ph.D dissertation explained why this is, coining the term “REST” in the process.
The Fielding dissertation also explains a lot about the problems of web APIs in the 2010s. The simple website I just walked you through is much more sophisticated than most currently deployed web APIs—even self-proclaimed REST APIs. If you’ve ever designed a web API, or written a client for one, you’ve probably encountered some of these problems:
Web APIs frequently have human-readable documentation that explains how to construct URLs for all the different resources. This is like writing English prose explaining how to find a particular file on an FTP server. If websites did this, no one would bother to use the Web.
Instead of telling you what URLs to type in, websites embed URLs
<a> tags and
<form> tags—hypermedia controls that
you can activate by clicking a link or a button.
In REST terms, putting information about URL construction in separate human-readable documents violates the principles of connectedness and self-descriptive messages.
Lots of websites have help docs, but when was the last time you used them? Unless there’s a serious problem (you bought something and it was never delivered), it’s easier to click around and figure out how the site works by exploring the connected, self-descriptive HTML documents it sends you.
Today’s APIs present their resources in a big menu of options instead of an interconnected web. This makes it difficult to see what one resource has to do with another.
Integrating with a new API inevitably requires writing custom software, or installing a one-off library written by someone else. But you don’t need to write custom software to use a new website. You see a URL on a billboard and plug it into your web browser—the same client you use for every other website in the world.
We’ll never get to the point where a single API client can understand every API in the world. But today’s clients contain a lot of code that really ought to be refactored out into generic libraries. This will only become possible when APIs serve self-descriptive representations.
When APIs change, custom API clients break and have to be fixed. But when a website undergoes a redesign, the site’s users grumble about the redesign and then they adapt. Their browsers don’t stop working.
In REST terms, the website redesign is entirely encapsulated in the self-descriptive HTML documents served by the website. A client that could understand the old HTML documents can understand the new ones.
These are the problems I’m trying to solve with this book. The good news is that it used to be a lot worse. A few years ago, it was common to see RESTful APIs that used safe HTTP methods in unsafe ways, or mixed up application and resource state. This doesn’t happen much anymore. Designs have gotten better, and they can get better still.
Now for the bad news. The story I’ve told you, the story of Alice’s trip through a website, went as smoothly as it did thanks to a very slow and expensive piece of hardware: Alice herself. Every time her browser rendered a web page, Alice, a human being, had to look at the rendered page and decide what to do next. The Web works because human beings make all the decisions about which links to click and which forms to fill out.
The whole point of web APIs is to get things done without making a
human sit in front of a web browser all day. How can we program a
computer to make the decisions about which links to click? A computer
can parse the HTML markup
<a href="/messages">Get started</a>, but
it can’t understand the phrase “Get started.” Why bother to design
APIs that serve self-descriptive messages if those messages won’t be
understood by their software consumers?
This is the biggest challenge in web API design: bridging the semantic gap between understanding a document’s structure and understanding what it means. As a shorthand, I’m going to call it the semantic challenge. Very little progress has been made on the semantic challenge, and we will never solve it completely. The good news is that because so little progress has been made so far, the first bit of progress is really easy. We just have to start working together, instead of duplicating each other’s work.
I’ll be checking in with the semantic challenge over the next few chapters, as I talk about the technologies of the Web and how you can use them in API designs. By Chapter 8, we’ll have the tools necessary to tackle the semantic challenge head-on.
 Fielding, Roy Thomas. Architectural Styles and the Design of Network-based Software Architectures. Doctoral dissertation, University of California, Irvine, 2000.