To harness the Web, you need to understand its foundations and design.
We start our journey toward Web APIs at the beginning. In the late 1960s the Advanced Research Projects Agency Network (ARPANET), a series of network-based systems connected by the TCP/IP protocol, was created by the Defense Advanced Research Projects Agenecy (DARPA). Initially, it was designed for universities and research laboratories in the US to share data. (see Figure 1-1).
ARPANET continued to evolve and ultimately led in 1982 to the creation of a global set of interconnected networks known as the Internet. The Internet was built on top of the Internet protocol suite (also known as TCP/IP), which is a collection of communication protocols. Whereas ARPANET was a fairly closed system, the Internet was designed to be a globally open system connecting private and public agencies, organizations, individuals, and insitutions.
In 1989, Tim Berners-Lee, a scientist at CERN, invented the World Wide Web, a new system for accessing linked documents via the Internet with a web browser. Navigating the documents of the Web (which were predominantly written in HTML) required a special application protocol, the Hypertext Transfer Protocol (HTTP). This protocol is at the center of what drives websites and Web APIs.
In this chapter we’ll dive into the fundamentals of the web architecture and explore HTTP. This will form a foundation that will assist us as we move forward into actually designing Web APIs.
A resource has a URI that identifies it and that HTTP clients will use to find it. A representation is data that is returned from that resource. Also related and significant is the media type, which defines the format of that data.
A resource is anything that has a URI. The resource itself is a conceptual mapping to one or more entities. In the early years of the Web it was very common for this entity to be a file such as a document or web page. However, a resource is not limited to being file oriented. A resource can be a service that interfaces with anything such as a catalog, a device (e.g., a printer), a wireless garage door opener, or an internal system like a CRM or a procurement system. A resource can also be a streaming medium such as a video or an audio stream.
As was mentioned earlier, each resource is addressable through a unique URI. You can think of a URI as a primary key for a resource. Examples of URIs are http://fabrikam.com/orders/100, http://ftp.fabrikam.com, mailto:John.Doe@example.com, telnet://192.168.1.100, and urn:isbn:978-1-449-33771-1. A URI can correspond only to a single resource, though multiple URIs can point to the same resource. Each URI is of the form scheme:hierarchical part[?query][#fragment] with the query string and fragment being optional. The hierachical part further consists of an optional authority and hierachical path.
URIs are divided into two categories, URLs and URNs. A URL (Universal Resource Locator) is an identifier that also refers to the means of accessing the resource, while a URN (Universal Resource Name) is simply a unique identifier for a resource. Each of the preceding example URIs is also a URL except the last one, which is a URN for this book. It contains no information on how to access the resource but does identify it. In practice, however, the majority of URIs you will likely see will be URLs, and for this reason the two are often used synonymously.
A cool URI is a URI that is simple, easy to remember (like http://www.example.com/people/alice), and doesn’t change. The reason for the URI not to change is so it does not break existing systems that have linked to the URI. So, if your resources are designed with the idea that clients maintain bookmarks to them, you should consider using a cool URI. Cool URIs work really well in particular for web pages to which other sites commonly link, or that users often store in their browser favorites. It is not required that URIs be cool. As you’ll see throughout the book, there are benefits to designing APIs without exposing many cool URIs.
A representation is a snapshot of a resource’s state at a point in time. Whenever an HTTP client requests a resource, it is the representation that is returned, not the resource itself. From one request to the next, the resource state can change dramatically, thus the representation that is returned can be very different. For example, imagine an API for developer articles that exposes the top article via the URI http://devarticles.com/articles/top. Instead of returning a link to the content, the API returns a redirect to the actual article. Over time, as the top article changes, the representation (via the redirect) changes accordingly. The resource, however, is not the article in this case; it’s the logic running on the server that retrieves the top article from the database and returns the redirect. It is important to note that each resource can have one or more representations, as you’ll learn about in Content Negotiation.
Each representation has a specific format known as a media type. A media type is a format for passing information across the Internet between clients and servers. It is indicated with a two-part identifier like
text/html. Media types serve different purposes. Some are extremely general purpose, like
application/json (which is a collection of values or key values) or
text/html (which is primarily for documents rendered in a browser). Other media types have more constrained semantics like
application/collection+json, which are designed specifically for managing feeds and lists. Then there is
image/png, which is for PNG images. Media types can also be highly domain specific, like
text/vcard, which is used for electronically sharing business card and contact information. For a list of some common media types you may encounter, see Appendix A.
The media type itself actually comprises two parts. The first part (before the slash) is the top-level media type. It describes general type information and common handling rules. Common top-level types are
multipart. The second part is the subtype, which describes a very specific data format. For example, in
image/gif, the top-level type tells a client this is an image, while the subtypes
gif specify what type of image it is and how it should be handled. It is also common for the subtype to have different variants that share common semantics but are different formats. As an example, HAL (Hypertext Application Language) has JSON (
application/hal+json) and XML (
hal+json means it’s HAL using a JSON wire format, while
hal+xml means the XML wire format.
Media types are conventionally registered in a central registry managed by IANA, the Internet Assigned Numbers Authority. The registry itself contains a list of media types and links to their associated specifications. The registry is categorized by top-level media types with each top-level section containing a list of specific media types.
Application developers who want to design clients or servers that understand standard media types refer to the registry for the specifications. For example, if you want to build a client that understands
image/png, you can navigate to the “image” section of the IANA media types pages and find “png” to get the
image/png spec, as shown in Figure 1-3.
Why do we need all these different media types? The reason is because each type has either specific benefits or clients to which it is tailored. HTML is great for laying out documents such as a web page, but not necessarily the best for transferring data. JSON is great for transferring data, but it is a horribly inefficient medium for representing images. PNG is a great image format, but not ideal for scalable vector graphics; for that, we have SVG. ATOM, HAL, and Collection+JSON express richer application semantics than raw XML or JSON, but they are more constrained.
Up until this point, you’ve seen the key components of the web architecture. In the next section we will dive into HTTP—the glue that brings everything together.
Now that we have covered the high-level web architecture, our next stop is HTTP. As HTTP is very comprehensive, we will not attempt to cover everything. Rather, we will focus on the major concepts—in particular, those that relate to building Web APIs. If you are new to HTTP, it should give you a good lay of the land. If you are not, you might pick up some things you didn’t know, but it’s also OK to skip it.
HTTP is the application-level protocol for information systems that powers the Web. HTTP was originally authored by three computer scientists: Tim Berners-Lee, Roy Fielding, and Henrik Frystyk Nielsen. It defines a uniform interface for clients and servers to transfer information across a network in a manner that is agnostic to implementation details. HTTP is designed for dynamically changing systems that can tolerate some degree of latency and some degree of staleness. This design allows intermediaries like proxy servers to intercede in communication, providing various benefits like caching, compression, and routing. These qualities of HTTP make it ideal for the World Wide Web, as it is a massive and dynamically changing and evolving network topology with inherent latency. It has also stood the test of time, powering the World Wide Web since its introduction in 1996.
HTTP is not standing still: it is actively evolving both in how we understand it and how we use it. There have been many misconceptions around the HTTP spec RFC 2616 due to ambiguities, or in some cases due to things deemed incorrect. The IETF (Internet Engineering Task Force) formed a working body known as httpbis that has created a set of drafts whose sole purpose is to clarify these misconceptions by completely replacing RFC 2616. Additionally, the group has been charged with creating the HTTP 2.0 spec. HTTP 2.0 also does not affect any of the public HTTP surface area; rather, it is a set of optimizations to the underlying transport, including adoption of the new SPDY protocol. Because httpbis exists as a replacement for the HTTP spec and provides an evolved understanding of HTTP, we’ll use that as the basis for the remainder of this section.
HTTP-based systems exchange messages in a stateless manner using a request/response pattern. We’ll give you a simplified overview of the exchange. First, an HTTP client generates an HTTP request, as shown in Figure 1-4.
That request is a message that includes an HTTP version, a URI of a resource that will be accessed, request headers, an HTTP method (like
GET), and an optional entity body (content). The request is then sent to an origin server where the resource presides. The server looks at the URI and HTTP method to decide if it can handle the message. If it can, it looks at the request headers that contain control information such as describing the content. The server then processes the message based on that information.
After the server has processed the message, an HTTP response, generally containing a representation of the resource (as shown in Figure 1-5), is generated.
The response contains the HTTP version, response headers, an optional entity body (containing the representation), a status code, and a description. Similar to the server that received the message, the client will inspect the response headers using its control information to process the message and its content.
Though accurate, the preceding description of HTTP message exchange leaves out an important piece: intermediaries). HTTP is a layered architecture in which each component/server has separation of concerns from others in the sytem; it is not required for an HTTP client to “see” the origin server. As the request travels along toward the origin server, it will encounter intermediaries, as shown in Figure 1-6, which are agents or components that inspect an HTTP request or response and may modify or replace it. An intermediary can immediately return a response, invoke some sort of process like logging the details, or just let it flow through. Intermediaries are beneficial in that they can improve or enhance communication. For example, a cache can reduce the response time by returning a cached result received from an origin server.
Notice that intermediaries can exist anywhere the request travels between the client and origin server; location does not matter. They can be running on the same machine as the client or origin server or be a dedicated public server on the Internet. They can be built in, such as the browser cache on Windows, or add-ons commonly known as middleware. ASP.NET Web API supports several pieces of middleware that can be used on the client or server, such as handlers and filters, which you will learn about in Chapters 4 and 10.
There are three types of intermediaries that participate in the HTTP message exchange and are visible to clients.
HTTP provides a standard set of methods that form the interface for a resource. Since the original HTTP spec was published, the
PATCH method has also been approved. As shown earlier in Figure 1-4, the method appears as part of the request itself. Next is a description of the common methods API authors implement.
GET, except it returns headers and not the body.
202 (Accepted)code and return a location header telling the client where it can find the new resource. If it does not create a resource, it should return a
200 (OK)or a
204 (No Content)code. In practice,
POSTcan handle just about any kind of processing and is not constrained.
200 (OK)or a
204 (No Content)code. However, if the resource does not exist, the server can create it. If it does, it should return a
201 (Created)code. The main difference between
POSTexpects the data that is sent to be processed, while
PUTexpects the data to be replaced or stored.
200code. If it is pending, it should return a
202 (Accepted)or a
204 (No Content)..
Allowheader specifying which HTTP methods are supported, though the spec leaves it completely open-ended. For example, it is entirely feasible to list which media types the server supports.
OPTIONScan also return a body, supplying further information that cannot be represented in the headers.
200 (OK)or a
204 (No Content)code. As with
PUT, if the resource does not exist, the server can create it. If it does, it should return a code of
201 (Created). A resource that supports
PATCHcan advertise it in the
Allowheader of an
Accept-Patchheader also allows the server to indicate an acceptable list of media types the client can use for sending a
PATCH. The spec implies that the media type should carry the semantics to communicate to the server the partial update information.
json-patchis a proposed media type in draft that provides a structure for expressing operations within a patch.
message/http. This is useful for diagnostics, as clients can see which proxies the request passed through and how the request may have been modified by intermediaries.
One of the additional features of HTTP is that it allows clients to make conditional requests. This type of request requires the client to send special headers that provide the server with information it needs to process the request. The headers include
If-ModifiedSince. Each of these headers will be described in further detail in Table B-2 in Appendix B.
GETis when a client sends headers that the server can use to determine if the client’s cached representation is still valid. If it is, the server returns a
304 (Not Modified)code rather than the representation. A conditional
GETreduces the network traffic (as the response is much smaller), and also reduces the server workload.
PUTis when a client sends headers that the server can use to determine if the client’s cached representation is still valid. If it is, the server returns a
409 (Preconditions Failed). A conditional
PUTis used for concurrency. It allows a client to determine at the time of doing the
PUTwhether another user changed the data.
Table 1-1 lists the HTTP methods and whether they are safe or idempotent.
Of the methods listed, the most common set used by API builders today are
PATCH, though new, is also becoming very common.
There are several benefits to having a standard set of HTTP methods:
OPTIONSprovide discoverability for the client so it can learn how those interactions will take place.
GETrequests can be cached; thus, if you do a
GET, the proxy may be able to return a cached representation rather than having the request travel all the way to the server.
HTTP messages contain header fields that provide information to clients and servers, which they should use to process the request. There are four types of headers: message, request, response, and representation.
Apply to both request and response messages and relate to the message itself rather than the entity body. They include:
For a comprehensive list and description of the standard headers in the HTTP specification, see Appendix B.
The HTTP specification continues to be extended. New headers can be proposed and approved by organizations like the IETF (Internet Engineering Task Force) or the W3C (World Wide Web Consortium) as extensions of the HTTP protocol. Two such examples, which are covered in later chapters of the book, are RFC 5861, which introduces new caching headers, and the CORS specification, which introduces new headers for cross origin access.
HTTP responses always return status codes and a description of whether the request succeeded; it is the responsibility of an origin server to always return both pieces of information. Both inform the client whether or not the request was accepted or failed and suggest possible next actions. The description is human-readable text describing the status code. Status codes range from 4xx to 5xx. Table 1-2 indicates the different categories of status codes and the associated references in httpbis.
The request has been received and processing is continuing.
The request has been accepted, received, and understood.
Further action is required to complete the request.
The request is invalid and cannot be completed.
The server has failed trying to complete the request.
Status codes can be directly associated with other headers. In the following snippet, the server has returned a
201, indicating that a new resource was created. The
Location header indicates to the client the URI of the created resources. Thus, HTTP
Clients should automatically check for the
Location in the case of a
HTTP/1.1 201 Created Cache-Control: no-cache Pragma: no-cache Content-Type: application/json; charset=utf-8 Location: http://localhost:8081/api/contacts/6
HTTP servers often have multiple ways to represent the same resources. The representations can be based on a variety of factors, including different capabilities of the client or optimizations based on the payload. For example, you saw how the Contact resource returns a vCard representation tailored to clients such as mail programs. HTTP allows the client to participate in the selection of the media type by informing the server of its preferences. This dance of selection between client and server is what is known as content negotiation, or conneg.
As we learned in Method properties, some responses are cachable—in particular, the responses for
HEAD requests. The main benefit of caching is to improve general performance and scale on the Internet. Caching helps clients and origin servers in the following ways:
An HTTP cache is a storage mechanism that manages adding, retrieving, and removing responses from the origin server to the cache. Caches will try to handle only requests that use a cachable method; all other requests (with noncachable methods) will be automatically forwarded to the origin server. The cache will also forward to the origin server requests that are cacheable, but that are either not present in the cache or expired.
httpbis defines a pretty sophisticated mechanism for caching. Though there are many finer details, HTTP caching is fundamentally based on two concepts: expiration and validation.
A response has expired or becomes stale if its age in the cache is greater than the maximum age, which is specified via a
max-age CacheControl directive in the response. It will also expire if the current date on the cache server exceeds the expiration date, which is specified via the response
Expires header. If the response has not expired, it is eligible for the cache to serve it; however, there are other pieces of control data (see Caching and negotiated responses) coming from the request and the cached response that may prevent it from being served.
When a response has expired, the cache must revalidate it. Validation means the cache will send a conditional
GET request (see Conditional requests) to the server asking if the cached response is still valid. The conditional request will contain a cache validator—for example, an
If-Modified-Since header with the
Last-Modified value of the response and/or an
If-None-Match header with the response’s
ETag value. If the origin server determines it is still valid, it will return a body-less response with a status code of
304 Not Modified, along with an updated expiration date. If the response has changed, the origin server will return a new response, which will ultimately get served by the cache and replace the current cached representation.
Once a response has been cached, it can also be invalidated. Generally, this will happen because the cache observes a request with an unsafe method to a resource that it has previously cached. Because a request was made that modifies the state of the resource, the cache knows that its representation is invalid. Additionally, the cache should invalidate the
Content-Location responses for the same unsafe request if the response was not an error.
An entity-tag, or ETag, is a validator for the currently selected representation at a point in time. It is represented as a quoted opaque identifier and should not be parsed by clients.
The server can return an ETag (which it also caches) in the response via the
ETag header. A client can save that ETag to use as a validator for a future conditional request, passing the ETag as the value for an
If-None-Match header. Note that the client in this case may be an intermediary cache. The server matches up the ETag in the request against the existing ETag it has for the requested resource. If the resource has been modified in the time since the ETag was generated, then the resource’s ETag on the server will have changed and there will not be a match.
There are two types of ETags:
Strong ETags are the default and should be preferred for conditional requests.
Caches support the ability to serve negotiated responses through the usage of the
Vary header. The
Vary header allows the origin server to specify one or more header fields that it used as part of performing content negotiation. Whenever a request comes in that matches a representation in the cache that has a
Vary header, the values for those fields must match in the request in order for that representation to be eligible to be served.
The following is an example of a response using the
Vary header to specify that the
Accept header was used:
HTTP/1.1 200 OK Content-Type: application/json; charset=utf-8 Content-Length: 183 Vary: Accept
Cache-Control header gives instructions to caching mechanisms through which that request/response passes related to its cachability. The instructions can be provided by either the origin server as part of the response, or the client as part of the request. The header value is a list of caching directives that specifies things like whether or not the content is cachable, where it may be stored, what its expiration policy is, and when it should be revalidated or reloaded from the origin server. For example, the
no-cache directive tells caches they must always revalidate the cached response before serving it.
Pragma header can specify a
no-cache value that is equivalent to the
no-cache Cache-Control directive.
Following is an example of a response using the
Cache-Control header. In this case, it is specifying the max age for caches as 3,600 seconds (1 hour) from the
Last-Modified date. It also specifies that cache servers must revalidate with the origin server once the cached representation has expired before returning it again:
HTTP/1.1 200 OK Cache-Control: must-revalidate, max-age=3600 Content-Type: application/json; charset=utf-8 Last-Modified: Wed, 26 Dec 2012 22:05:15 GMT Date: Thu, 27 Dec 2012 01:05:15 GMT Content-Length: 183
HTTP provides an extensible framework for servers that allows them to protect their resources and allows clients to access them through authentication. Servers can protect one or more of their resources, with each resource being assigned to a logical partition known as a realm. Each realm can have its own authentication scheme, or method of authorization it supports.
Upon receiving a request for accessing a protected resource, the server will return a response with a status
401 Unauthorized or a status
403 Forbidden. The response will also contain a
WWW-Authenticate header containing a challenge, indicating that the client must authenticate to access the resource. The challenge is an extensible token that describes the authentication scheme and additional authentication parameters. For example, the challenge for accessing a protected contacts resource that specifies the use of the HTTP basic authentication scheme is
To explore how this challenge/response mechanism works in more detail, see Appendix E.
In the previous section we learned about the framework for authentication. RFC 2617 then defines two concrete authentication mechanisms.
The following is an example of an HTTP Basic challenge response after an attempt to access a protected resource:
HTTP/1.1 401 Unauthorized ... WWW-Authenticate: Basic realm="Web API Book" ...
As you can see, the server has returned a
401, including a
WWW-Authenticate header indicating that the client must authenticate using HTTP Basic:
GET /resource HTTP/1.1 ... Authorization: Basic QWxpY2U6VGhlIE1hZ2ljIFdvcmRzIGFyZSBTcXVlYW1pc2ggT3NzaWZyYWdl
The client then sends back the original request, including the
Authorization header, in order to access the protected resource.
There are additional authentication schemes that have appeared since RFC 2617, including vendor-specific mechanisms:
In this chapter we’ve taken a broad-brush approach at surveying the HTTP landscape. The concepts covered were not meant for completeness but rather to help you wade into the pool of HTTP and give you a basic foundation for your ASP.NET Web API development. You’ll notice we’ve included further references for each of the items discussed. These references will prove invaluable as you actually move forward with your Web API development, so keep them in your back pocket! On to APIs!