Though the Web began as a publishing platform, it is now emerging as a means of connecting distributed applications. The Web as a platform is the result of its architectural simplicity, the use of a widely implemented and agreed-upon protocol (HTTP), and the pervasiveness of common representation formats. The Web is no longer just a successful large-scale information system, but a platform for an ecosystem of services.
But how can resources, identifiers, document formats, and a protocol make such an impression? Why, even after the dot-com bubble, are we still interested in it? What do enterprises—with their innate tendency toward safe middleware choices from established vendors—see in it? What is new that changes the way we deliver functionality and integrate systems inside and outside the enterprise?
As developers, we build solutions on top of platforms that solve or help with hard distributed computing problems, leaving us free to work on delivering valuable business functionality. Hopefully, this book will give you the information you need in order to make an informed decision on whether the Web fits your problem domain, and whether it will help or hinder delivering your solution. We happen to believe that the Web is a sensible solution for the majority of the distributed computing problems encountered in business computing, and we hope to convince you of this view in the following chapters. But for starters, here are a number of reasons we’re such web fans.
An application platform isn’t of much use unless it’s supported by software libraries and development toolkits. Today, practically all operating systems and development platforms provide some kind of support for web technologies (e.g., .NET, Java, Perl, PHP, Python, and Ruby). Furthermore, the capabilities to process HTTP messages, deal with URIs, and handle XML or JSON payloads are all widely implemented in web frameworks such as Ruby on Rails, Java servlets, PHP Symfony, and ASP.NET MVC. Web servers such as Apache and Internet Information Server provide runtime hosting for services.
Underpinned by HTTP, the web architecture supports a global deployment of networked applications. But the massive volume of blogs, mashups, and news feeds wouldn’t have been possible if it wasn’t for the way in which the Web and HTTP constrain solutions to a handful of scalable patterns and practices.
Scalability and performance are quite different concerns.
Naively, it would seem that if latency and bandwidth are critical
success factors for an application, using HTTP is not a good option.
We know that there are messaging protocols with far better performance
characteristics than HTTP’s text-based, synchronous, request-response behavior. Yet
this is an inequitable comparison, since HTTP is not just another
messaging protocol; it’s a protocol that implements some very specific
application semantics. The HTTP verbs (and
in particular) support caching, which translates into reduced latency, enabling
massive horizontal scaling for large aggregate throughput of
As developers ourselves, we understand how we can believe that asynchronous message-centric solutions are the most scalable and highest-performing options. However, existing high-performance and highly available services on the Web are proof that a synchronous, text-based request-response protocol can provide good performance and massive scalability when used correctly.
The Web combines a widely shared vision for how to use HTTP efficiently and how to federate load through a network. It may sound incredible, but through the remainder of this book, we hope to demonstrate this paradox beyond doubt.
The Web is loosely coupled, and correspondingly scalable. The Web does not try to incorporate in its architecture and technology stack any of the traditional quality-of-service guarantees, such as data consistency, transactionality, referential integrity, statefulness, and so on. This deliberate lack of guarantees means that browsers sometimes try to retrieve nonexistent pages, mashups can’t always access information, and business applications can’t always make immediate progress. Such failures are part of our everyday lives, and the Web is no different. Just like us, the Web needs to know how to cope with unintended outcomes or outright failures.
A software agent may be given the URI of a resource on the Web, or it might retrieve it from the list of hypermedia links inside an HTML document, or find it after a business-to-business XML message interaction. But a request to retrieve the representation of that resource is never guaranteed to be successful. Unlike other contemporary distributed systems architectures, the Web’s blueprints do not provide any explicit mechanisms to support information integrity. For example, if a service on the Web decides that a URI is no longer going to be associated with a particular resource, there is no way to notify all those consumers that depend on the old URI–resource association.
This is an unusual stance, but it does not mean that the Web is
neglectful—far from it. HTTP defines response codes that can be used
by service providers to indicate what has happened. To communicate
that “the resource is now associated with a new URI,” a service can
use the status code
Other. The Web always tries to help move us toward a
successful conclusion, but without introducing tight
Although business processes can be modeled and exposed through web resources, HTTP does not provide direct support for such processes. There is a plethora of work on vocabularies to capture business processes (e.g., BPEL, WS-Choreography), but none of them has really embraced the Web’s architectural principles. Yet the Web—and hypermedia specifically—provides a great platform for modeling business-to-business interactions.
Instead of reaching for extensive XML dialects to construct choreographies, the Web allows us to model state machines using HTTP and hypermedia-friendly formats such as XHTML and Atom. Once we understand that the states of a process can be modeled as resources, it’s simply a matter of describing the transitions between those resources and allowing clients to choose among them at runtime.
This isn’t exactly new thinking, since HTML does precisely this
for the human-readable Web through the
href=“…”> tag. Although implementing hypermedia-based
solutions for computer-to-computer systems is a new step for most
developers, we’ll show you how to embrace this model in your systems
to support loosely coupled business processes (i.e., behavior, not
just data) over the Web.
To the Web, one representation looks very much like another. The Web doesn’t care if a document is encoded as HTML and carries weather information for on-screen human consumption, or as an XML document conveying the same weather data to another application for further processing. Irrespective of the format, they’re all just resource representations.
The principle of uniformity and least surprise is a fundamental aspect of the Web. We see this in the way the number of permissible operations is constrained to a small set, the members of which have well-understood semantics. By embracing these constraints, the web community has developed myriad creative ways to build applications and infrastructure that support information exchange and application delivery over the Web.
Caches and proxy servers work precisely because of the widely
understood caching semantics of some of the HTTP verbs—in particular,
GET. The Web’s underlying infrastructure
enables reuse of software tools and development libraries to provide
an ecosystem of middleware services, such as caches, that support
performance and scaling. With plumbing that understands the
application model baked right into the network, the Web allows
innovation to flourish at the edges, with the heavy lifting being
carried out in the cloud.
This focus on resources, identifiers, HTTP, and formats as the building blocks of the world’s largest distributed information system might sound strange to those of us who are used to building distributed applications around remote method invocations, message-oriented middleware platforms, interface description languages, and shared type systems. We have been told that distributed application development is difficult and requires specialist software and skills. And yet web proponents constantly talk about simpler approaches.
Traditionally, distributed systems development has focused on exposing custom behavior in the form of application-specific interfaces and interaction protocols. Conversely, the Web focuses on a few well-known network actions (those now-familiar HTTP verbs) and the application-specific interpretation of resource representations. URIs, HTTP, and common representation formats give us reach—straightforward connectivity and ubiquitous support from mobile phones and embedded devices to entire server farms, all sharing a common application infrastructure.