Cover by Sam Ruby, Leonard Richardson

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

SOAP

SOAP is the foundation on which the plethora of WS-* specifications is built. Despite the hype and antihype it’s been subjected to, there’s amazingly little to this specification. You can take any XML document (so long as it doesn’t have a DOCTYPE or processing instructions), wrap it in two little XML elements, and you have a valid SOAP document. For best results, though, the document’s root element should be in a namespace.

Here’s an XML document:

<hello-world xmns="http://example.com"/>

Here’s the same document, wrapped in a SOAP envelope:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
 <soap:Body>
   <hello-world xmns="http://example.com"/>
 </soap:Body>
</soap:Envelope>

The only catch is that the SOAP Envelope must have the same character encoding as the document it encloses. That’s pretty much all there is to it. Wrapping an XML document in two extra elements is certainly not an unreasonable or onerous task, but it doesn’t exactly solve all the world’s problems either.

Seem too simple? Here’s a real-world example. In Example 1-8 I showed you an elided version of a SOAP document you might submit to Google’s web search service. Example 10-1 shows the whole document.

Example 10-1. A SOAP envelope to be submitted to Google’s SOAP search service

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <gs:doGoogleSearch xmlns:gs="urn:GoogleSearch">
      <key>00000000000000000000000000000000</key>
      <q>REST book</q>
      <start>0</start>
      <maxResults>10</maxResults>
      <filter>true</filter>
      <restrict/>
      <safeSearch>false</safeSearch>
      <lr/>
      <ie>latin1</ie>
      <oe>latin1</oe>
    </gs:doGoogleSearch>
   </soap:Body>
</soap:Envelope>

This document describes a Call to the Remote Procedure gs:doGoogleSearch. All of the query parameters are neatly tucked into named elements. This example is fully functional, though if you POST it to Google you’ll get back a fault document saying that the key is not valid.

This style of encoding parameters to a remote function is sometimes called RPC/literal or Section 5 encoding. That’s the section in the SOAP 1.1 specification that shows how to use SOAP for RPC. But over time, fashions change. Later versions of the specification made support of this encoding optional, and so it’s now effectively deprecated. It was largely replaced by an encoding called document/literal, and then by wrapped document/literal. Wrapped document/literal looks largely the same as section 5 encoding, except that the parameters tend to be scoped to a namespace.

One final note about body elements: the parameters may be annotated with data type information based on XML Schema Data Types. This annotation goes into attributes, and generally reduces the readability of the document. Instead of <ie>latin1</ie> you might see <ie xsi:type="xsd:string">latin1</ie>. Multiply that by the number of arguments in Example 10-1 and you may start to see why many recoil in horror when they hear “SOAP.”

In Chapter 1 I said that HTTP and SOAP are just different ways of putting messages in envelopes. HTTP’s main moving parts are the entity-body and the headers. With a SOAP element named Body, you might expect to also find a Header element. You’d be right. Anything that can go into the Body element—any namespaced document which has no DOCTYPE or processing instructions—can go into the Header. But while you tend to only find a single element inside the Body, the Header can contain any number of elements. Header elements also tend to be small.

Recalling the terminology used in HTTP: Documents in Envelopes” in Chapter 1, headers are like “stickers” on an envelope. SOAP headers tend to contain information about the data in the body, such as security and routing information. The same is true of HTTP headers.

SOAP defines two attributes for header entities: actor and mustUnderstand. If you know in advance that your message is going to pass through intermediaries on the way to its destination, you can identify (via a URI) the actor that’s the target of any particular header. The mustUnderstand attribute is used to impose restrictions on those intermediaries (or on the final destination). If the actor doesn’t understand a header addressed to it, and mustUnderstand is true, it must reject the message—even if it thinks it could handle the message otherwise. An example of this would be a header associated with a two-phase commit operation. If the destination doesn’t understand two-phase commit, you don’t want the operation to proceed.

Beyond that, there isn’t much to SOAP. Requests and responses have the same format, similar to HTTP. There’s a separate format for a SOAP Fault, used to signify an error condition. Right now the only thing that can go into a SOAP document is an XML document. There have been a few attempts to define mechanisms for attaching binary data to messages, but no clear winner has emerged.

Given this fairly simple protocol, what’s the basis for the hype and controversy? SOAP is mainly infamous for the technologies built on top of it, and I’ll cover those next. It does have one alleged benefit of its own: transport independence. The headers are inside the message, which means they’re independent of the protocol used to transport the message. You don’t have to send a SOAP envelope inside an HTTP envelope. You can send it over email, instant messaging, raw TCP, or any other protocol. In practice, this feature is rarely used. There’s been some limited public use of SMTP transports, and some use of JMS transports behind the corporate firewall, but the overwhelming majority of SOAP traffic is over HTTP.

The Resource-Oriented Alternative

SOAP is almost always sent over HTTP, but SOAP toolkits make little use of HTTP status codes, and tend to coerce all operations into POST methods. This is not technically disallowed by the REST architectural style, but it’s a degenerate sort of RESTful architecture that doesn’t get any of the benefits REST is supposed to provide. Most SOAP services support multiple operations on diverse data, all mediated through POST on a single URI. This isn’t resource-oriented: it’s RPC-style.

The single most important change you can make is to split your service into resources: identify every “thing” in your service with a separate URI. Pretty much every SOAP toolkit in existence provides access to this information, so use it! Put the object reference up front. Such usages may not feel idiomatic at first, but if you stop and think about it, this is what you’d expect to be doing if SOAP were really a Simple Object Access Protocol. It’s the difference between object-oriented programming in a function-oriented language like C:

my_function(object, argument);

and in an object-oriented language like C++:

object->my_method(argument);

When you move the scoping information outside the parentheses (or, in this case, the Envelope), you’ll soon find yourself identifying large numbers of resources with common functionality. You’ll want to refactor your logic to exploit these commonalities.

The next most important change has to do with the object-oriented concept of polymorphism. You should try to make objects of different types respond to method calls with the same name. In the world of the Web, this means (at a minimum) supporting HTTP’s GET method. Why is this important? Think about a programming language’s standard library. Pretty much every object-oriented language defines a standard class hierarchy, and at its root you find an Object class which defines a toString method. The details are different for every language, but the result is always the same: every object has a method that provides a canonical representation of the object. The GET method provides a similar function for HTTP resources.

Once you do this, you’ll inevitably notice that the GET method is used more heavily than all the other methods you have provided. Combined. And by a wide margin. That’s where conditional GET and caching come in. Implement these standard features of HTTP, make your representations cacheable, and you make your application more scalable. That has direct and tangible economic benefits.

Once you’ve done these three simple things, you may find yourself wanting more. Chapter 8 is full of advice on these topics.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required