Throughout this book I’ve said that web services are based on three fundamental technologies: HTTP, URIs, and XML. But there are also lots of technologies that build on top of these. You can usually save yourself some work and broaden your audience by adopting these extra technologies: perhaps a domain-specific XML vocabulary, or a standard set of rules for exposing resources through HTTP’s uniform interface. In this chapter I’ll show you several technologies that can improve your web services. Some you’re already familiar with and some will probably be new to you, but they’re all interesting and powerful.
What representation formats should your service actually send and receive? This is the question of how data should be represented, and it’s an epic question. I have a few suggestions, which I present here in a rough order of precedence. My goal is to help you pick a format that says something about the semantics of your data, so you don’t find yourself devising yet another one-off XML vocabulary that no one else will use.
I assume your clients can accept whatever representation format you serve. The known needs of your clients take priority over anything I can say here. If you know your data is being fed directly into Microsoft Excel, you ought to serve representations in Excel format or a compatible CSV format. My advice also does not extend to document formats that can only be understood by humans. If you’re serving audio files, I’ve got nothing to say about which audio format you should choose. To a first approximation, a programmed client finds all audio files equally unintelligible.
media type is deprecated for XHTML. It’s also the only media type that
Internet Explorer handles as HTML. If your service might be serving
XHTML data directly to web browsers, you might want to serve it as
My number-one representation recommendation is the format I’ve been using in my own services throughout this book, and the one you’re probably most familiar with. HTML drives the human web, and XHTML can drive the programmable web. The XHTML standard (http://www.w3.org/TR/xhtml1/) relies on the HTML standard to do most of the heavy lifting (http://www.w3.org/TR/html401/).
XHTML is HTML under a few restrictions that make every XHTML document also valid XML. If you know HTML, you know most of what there is to know about XHTML, but there are some syntactic differences, like how to present self-closing tags. The tag names and attributes are the same: XHTML is expressive in the same ways as HTML. Since the XHTML standard just points to the HTML standard and then adds some restrictions to it, I tend to refer to “HTML tags” and the like except where there really is a difference between XHTML and HTML.
I don’t actually recommend HTML as a representation format, because it can’t be reliably parsed with an XML parser. There are many excellent and liberal HTML parsers, though (I mentioned a few in Chapter 2), so your clients have options if you can’t or don’t want to serve XHTML. Right now, XHTML is a better choice if you expect a wide variety of clients to handle your data.
HTML can represent many common types of data: nested
lists (tags like
li), key-value pairs (the
dl tag and its
children), and tabular data (the
table tag and
its children). It supports many different kinds of hypermedia. HTML
does have its shortcomings: its hypermedia forms are limited, and
won’t fully support HTTP’s uniform interface until HTML 5 is
HTML is also poor in semantic content. Its tag vocabulary is
very computer-centric. It has special tags for representing computer
code and output, but nothing for the other structured fruits of human
endeavor, like poetry. One resource can link to another resource, and
there are standard HTML attributes (
rev) for expressing the
relationship between the linker and the linkee. But the HTML standard
defines only 15 possible relationships between resources, including
“alternate,” “stylesheet,” “next,” “prev,” and “glossary.” See http://www.w3.org/TR/html401/types.html#type-links for
a complete list.
Since HTML pages are representations of resources, and resources
can be anything, these 15 relationships barely scratch the surface.
HTML might be called upon to represent the relationship between any
two things. Of course, I can come up with my own values for
rev to supplement the official 15, but if
everyone does that confusion will reign: we’ll all pick different
values to represent the same relationships. If I link my web page to
my wife’s web page, should I specify my relationship to her as
husband, spouse, or sweetheart? To a human it doesn’t matter much, but
to a computer program (the real client on the programmable web) it
matters a lot. Similarly, HTML can easily represent a list, and
there’s a standard HTML attribute (
class) for expressing what kind of list it
is. But HTML doesn’t say what kinds of lists there are.
This isn’t HTML’s fault, of course. HTML is supposed to be used by people who work in any field. But once you’ve chosen a field, everyone who works in that field should be able to agree on what kinds of lists there are, or what kinds of relationships can exist between resources. This is why people have started getting together and adding standard semantics to XHTML with microformats.
lightweight standards that extend XHTML to give domain-specific
semantics to HTML tags. Instead of reinventing data storage techniques
like lists, microformats use existing HTML tags like
abbr. The semantic content
usually lives in custom values for the attributes of the tags, such as
rev. Example 9-1 shows an example: someone’s home
telephone number represented in the microformat known as hCard.
Example 9-1. A telephone number represented in the hCard microformat
<span class="tel"> <span class="type">home</span>: <span class="value">+1.415.555.1212</span> </span>
Microformat adoption is growing, especially as more special-purpose devices get on the web. Any microformat document can be embedded in an XHTML page, because it is XHTML. A web service can serve an XHTML representation that contains microformat documents, along with links to other resources and forms for creating new ones. This document can be automatically parsed for its microformat data, or rendered for human consumption with a standard web browser.
As of the time of writing there were nine microformat
specifications. The best-known is probably
rel-nofollow, a standard value for the
rel attribute invented by engineers
at Google as a way of fighting comment spam on weblogs. Here’s a
complete list of official microformats:
<a href="http://creativecommons.org/licenses/by-nd/" rel="license"> Made avaliable under a Creative Commons Attribution-NoDerivs license. </a>
That’s standard XHTML. The only thing the microformat does
is define a meaning for the string
license when it shows up in the
<a rev="vote-for" href="http://www.example.com">The best webpage ever.</a> <a rev="vote-against" href="http://example.com/"> A shameless ripoff of www.example.com</a>
Stands for XHTML Friends Network. A new set of values for the
rel attribute, for capturing
the relationships between people. An XFN value for the
rel attribute captures the
relationship between this “person” resource and another such
resource. To bring back the “Alice” and “Bob” resources from
Relationships Between Resources” in Chapter 8, an XHTML representation of Alice
might include this link:
<a rel="spouse" href="Bob">Bob</a>
Stands for XHTML Meta Data Profiles. A way of describing
your custom values for XHTML attributes, using the XHTML tags
for definition lists:
DT. This is a kind of
meta-microformat: a microformat like
rel-tag could itself be described with
an XMDP document.
Stands (sort of) for Extensible Open XHTML Outlines. Uses XHTML’s list tags to represent outlines. There’s nothing in XOXO that’s not already in the XHTML standard, but declaring a document (or a list in a document) to be XOXO signals that a list is an outline, not just a random list.
Those are the official microformat standards; they should give you an idea of what microformats are for. As of the time of writing there were also about 10 microformat drafts and more than 50 discussions about possible new microformats. Here are some of the more interesting drafts:
A way of marking up latitude and longitude on Earth. This would be useful in the mapping application I designed in Chapter 5. I didn’t use it there because there’s still a debate about how to represent latitude and longitude on other planetary bodies: extend geo or define different microformats for each body?
A way of representing bookmarks. This would make an excellent representation format for the social bookmarking application in Chapter 7. I chose to use Atom instead because it was less code to show you.
You get the idea. The power of microformats is that they’re based on HTML, the most widely-deployed markup format in existence. Because they’re HTML, they can be embedded in web pages. Because they’re also XML, they can be embedded in XML documents. They can be understood at various levels by human beings, specialized microformat processors, dumb HTML processors, and even dumber XML processors.
Even if the microformats wiki shows no microformat standard or draft for your problem space, you might find an open discussion on the topic that helps you clarify your data structures. You can also create your own microformat (see Ad Hoc XHTML” later in this chapter).
Atom is an XML vocabulary for describing lists of timestamped entries. The entries can be anything, but they usually contain pieces of human-authored text like you’d see on a weblog or a news site. Why should you use an Atom list instead of a regular XHTML list? Because Atom provides special tags for conveying the semantics of publishing: authors, contributors, languages, copyright information, titles, categories, and so on. (Of course, as I mentioned earlier, there’s a microformat called hAtom that brings all of these semantics into XHTML.) Atom is a useful XML vocabulary because so many web services are, in the broad sense, ways of publishing information. What’s more, there are a lot of web service clients that understand the semantics of Atom documents. If your web service is addressable and your resources expose Atom representations, you’ve immediately got a huge audience.
Some feeds are written in some version of RSS, a different XML vocabulary with similar semantics. All versions of RSS have the same basic structure as Atom: a feed that contains a number of entries. There are a number of variants of RSS but you shouldn’t have to worry about it at all. Today, every major tool for consuming feeds understands Atom.
These days, most weblogs and news sites expose a special resource whose representation is an Atom feed. The entries in the feed describe and link to other resources: weblog entries or news stories published on the site. You, the client, can consume these resources with a feed reader or some other external program. In Chapter 7, I represented lists of bookmarks as Atom feeds. Example 9-2 shows a simple Atom feed document.
Example 9-2. A simple Atom feed containing one entry
<?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> <title>RESTful News</title> <link rel="alternate" href="http://example.com/RestfulNews" /> <updated>2007-04-14T20:00:39Z</updated> <author><name>Leonard Richardson</name></author> <contributor><name>Sam Ruby</name></contributor> <id>urn:1c6627a0-8e3f-0129-b1a6-003065546f18</id> <entry> <title>New Resource Will Respond to PUT, City Says</title> <link rel="edit" href="http://example.com/RestfulNews/104" /> <id>urn:239b2f40-8e3f-0129-b1a6-003065546f18</id> <updated>2007-04-14T20:00:39Z</updated> <summary> After long negotiations, city officials say the new resource being built in the town square will respond to PUT. Earlier criticism of the proposal focused on the city's plan to modify the resource through overloaded POST. </summary> <category scheme="http://www.example.com/categories/RestfulNews" term="local" label="Local news" /> </entry> </feed>
In that example you can see some of the tags that convey the
semantics of publishing:
updated, and so on. The feed as a whole is a
joint project: it has an
contributor tag. It’s also
link tag that points to an
alternate URI for the underlying “feed” resource: the news site. The
single entry has no
author tag, so
it inherits author information from the feed. The entry does have its
link tag, which points to
http://www.example.com/RestfulNews/104. That URI
identifies the entry as a resource in its own right. The entry also
has a textual summary of the story. To get the remainder, the client
must presumably GET the entry’s URI.
An Atom document is basically a directory of published
resources. You can use Atom to represent photo galleries, albums of
music (maybe a link to the cover art plus one to each track on the
album), or lists of search results. Or you can omit the
LINK tags and use Atom as a container for
original content like status reports or incoming emails. Remember: the
two reasons to use Atom are that it represents the semantics of
publishing, and that a lot of existing clients can consume it.
If your application almost fits in with the Atom schema, but needs an extra tag or two, there’s no problem. You can embed XML tags from other namespaces in an Atom feed. You can even define a custom namespace and embed its tags in your Atom feeds. This is the Atom equivalent of XHTML microformats: your Atom feeds can use conventions not defined in Atom, without becoming invalid. Clients that don’t understand your tag will see a normal Atom feed with some extra mysterious data in it.
one XML vocabulary that’s commonly embedded in Atom
documents. It’s designed for representing lists of search results.
The idea is that a service returns the results of a query as an Atom
feed, with the individual results represented as Atom entries. But
some aspects of a list of search results can’t be represented in a
stock Atom feed: the total number of results, for instance. So
OpenSearch defines three new elements, in the
If all the search results are numbered from zero to
totalResults, then the
first result in this feed document is entry number
startindex. When combined with
itemsPerPage you can use this
to figure out what “page” of results you’re on.
Most graphic formats are just ways of laying pixels out on the screen. The underlying content is opaque to a computer: it takes a skilled human to modify a graphic or reuse part of one in another. Scalable Vector Graphics is an XML vocabulary that makes it possible for programs to understand and manipulate graphics. It describes graphics in terms of primitives like shapes, text, colors, and effects.
It would be a waste of time to represent a photograph in SVG, but using it to represent a graph, a diagram, or a set of relationships gives a lot of power to the client. SVG images can be scaled to arbitrary size without losing any detail. SVG diagrams can be edited or rearranged, and bits of them can be seamlessly snipped out and incorporated into other graphics. In short, SVG makes graphic documents work like other sorts of documents. Web browsers are starting to get support for SVG: newer versions of Firefox support it natively.
I covered this simple format in Chapter 6. This format is mainly used in representations the client sends to the server. A filled-out HTML form is represented in this format by default, and it’s an easy format for an Ajax application to construct. But a service can also use this format in the representations it sends. If you’re thinking of serving comma-separated values or RFC 822-style key-value pairs, try form-encoded values instead. Form-encoding takes care of the tricky cases, and your clients are more likely to have a library that can decode the document.
As I show in Chapter 11, JSON has special advantages when it comes to Ajax applications. It’s useful for any kind of application, though. If your data structures are more complex than key-value pairs, or you’re thinking of defining an ad hoc XML format, you might find it easier to define a JSON structure of nested hashes and arrays.
The Resource Description
Framework is a way of representing knowledge about resources.
Resource here means the same thing as in
Resource-Oriented-Architecture: a resource is anything important
enough to have a URI. In RDF, though, the URIs might not be
http: URIs. Abstract URI schemas like
isbn: (for books) and
urn: (for just about anything) are common.
Example 9-4 is a simple RDF assertion, which claims that
the title of this book is RESTful Web
Example 9-4. An RDF assertion
<span about="isbn:9780596529260" property="dc:title"> RESTful Web Services </span>
There are three parts to an RDF assertion, or triple, as they’re called.
There’s the subject, a resource identifier:
in this case,
There’s the predicate, which identifies a
property of the resource: in this case,
dc:title. Finally there’s the object, which is the value
of the property: in this case, “RESTful Web Services.” The assertion
as a whole reads: “The book with ISBN 9780596529260 has a title of
‘RESTful Web Services.’”
I didn’t make up the
URI space: it’s a standard way of addressing books as resources. I
didn’t make up the
predicate, either. That comes from the Dublin Core
Metadata Initiative. DCMI defines a set of useful predicates
that apply to published works like books and weblogs. An automated
client that understands the Dublin Core can scan RDF documents that
use those terms, evaluate the assertions they contain, and even make
logical deductions about the data.
Example 9-4 looks a lot like an XHTML snippet, because that’s what it is. There are a couple ways of representing RDF assertions, and I’ve chosen to show you RDFa, a microformat-like standard for embedding RDF in XHTML. RDF/XML is a more popular RDF representation format, but I think it makes RDF look more complicated than it is, and it’s difficult to integrate RDF/XML documents into the web. RDF/A documents can go into XHTML files, just like microformat documents. However, since RDFa takes some ideas from the unreleased XHTML 2 standard, a document that includes it won’t be valid XHTML for a while. A third way of representing RDF assertions is eRDF, which results in valid XHTML.
RDF in its generic form is the basis for the W3C’s Semantic Web project. On the human web, there are no standards for how we talk about the resources we link to. We describe resources in human language that’s difficult or impossible for machines to understand. RDF is a way of constraining human speech so that we talk about resources using a standard vocabulary—not one that machines “understand” natively, but one they can be programmed to understand. A computer program doesn’t understand the Dublin Core’s “dc:title” any more than it understands “title.” But if everyone agrees to use “dc:title,” we can program standard clients to reason about the Dublin Core in consistent ways.
Here’s the thing: I think microformats do a good job of adding semantics to the web we already have, and they add less complexity than RDF’s general subject-predicate-object form. I recommend using RDF only when you want interoperability with existing RDF processors, or are treating RDF as a general-purpose microformat for representing assertions about resources.
One very popular use of RDF is FOAF, a way of representing information about human beings and the relationships between them.
I’m talking here about informal XML vocabularies used by frameworks like Ruby’s ActiveRecord and Python’s Django to serialize database objects as XML. I gave an example back in Example 7-4. It’s a simple data structure: a hash or a list of hashes.
These representation formats are very convenient if you happen
to be writing a service that gives you access to one. In Rails, you
can just call
to_xml on an ActiveRecord object or a list of such objects. The
Rails serialization format is also useful if you’re not using Rails,
but you want your service to be usable by ActiveResource clients.
Otherwise, I don’t really recommend these formats, unless you’re just
trying to get something up and running quickly (as I am in Chapters
major downside of these formats is that they look like documents, but
they’re really just serialized data structures. They never contain
hypermedia links or forms.
If none of the work that’s already been done fits your problem space... well, first, think again. Just as you should think again before deciding you can’t fit your resources into HTTP’s uniform interface. If you think your resources can’t be represented by stock HTML or Atom or RDF or JSON, there’s a good chance you haven’t looked at the problem in the right way.
But it’s quite possible that your resources won’t fit any of the representation formats I’ve mentioned so far. Or maybe you can represent most of your resource state with XHTML plus some well-chosen microformats, but there’s still something missing. The next step is to consider creating your own microformat.
The high-impact way of creating a microformat is to go through the microformat process, hammer it out with other microformat enthusiasts, and get it published as an official microformat. This is most appropriate when lots of people are trying to represent the same kind of data. Ideally, you’re in a situation where the human web is littered with ad hoc HTML representations of the data, and where there are already a couple of big standards that can serve as a model for a more agile microformat. This is how the hCard and hCalendar microformats were developed. There were many people trying to put contact information and upcoming events on the human web, and preexisting standards (vCard and iCalendar) to steal ideas from. The representation of “places on a map” that I devised in Chapter 5 might be a starting point for an official microformat. There are lots of mapping sites on the human web, and lots of heavyweight standards for representing GIS data. If I wanted to build a microformat, I’d have a lot of ideas to work from.
The low-impact way of creating a microformat is to add semantic content to the XHTML you were going to write anyway. This is suitable for representation formats that no one else is likely to use, or as a starting point so you can get a real web service running while you’re going through the microformat process. The representation of the list of planets from Chapter 5 works better as an ad hoc set of semantics than as an official microformat. All it’s doing is saying that one particular list is a list of planets.
The microformat design patterns and naming principles give a set of sensible general rules for adding semantics to HTML. Their advice is useful even if you’re not trying to create an official microformat. The semantics you choose for your “micromicroformat” won’t be standardized, but you can present them in a standard way: the way microformats do it. Here are some of the more useful patterns.
If there’s an HTML tag that conveys the semantics you want,
use it. To represent a set of key-value pairs, use the
dl tag. To represent a list, use one of
the list tags. If nothing fits, use the
Give a tag additional semantics by specifying its
class attribute. This is especially
div, which have no real meaning on their
rel attribute in
a link to specify another resource’s relationship to this one. Use
rev attribute to specify
this page’s relationship to another one. If the relationship is
rel. See Hypermedia Technologies” later in this chapter for more on
Consider providing an XMDP file that describes your custom
In addition to XHTML, Atom, and SVG, there are a lot of specialized XML vocabularies I haven’t covered: MathML, OpenDocument, Chemical Markup Language, and so on. There are also specialized vocabularies you can use in RDF assertions, like Dublin Core and FOAF. A web service might serve any of these vocabularies as standalone representations, embed them into Atom feeds, or even wrap them in SOAP envelopes. If none of these work for you, you can define a custom XML vocabulary to represent your resource state, or maybe the parts that Atom doesn’t cover.
Although I’ve presented this as the last resort, that’s certainly not the common view. People come up with custom XML vocabularies all the time: that’s how there got to be so many of them. Almost every real web service mentioned in this book exposes its representations in a custom XML vocabulary. Amazon S3, Yahoo!’s search APs, and the del.icio.us API all serve representations that use custom XML vocabularies, even though they could easily serve Atom or XHTML and reuse an existing vocabulary.
Part of this is tech culture. The microformats idea is fairly new, and a custom XML vocabulary still looks more “official.” But this is an illusion. Unless you provide a schema definition for your vocabulary, your custom tags have exactly the same status as a custom value for the HTML “class” attribute. Even a definition does nothing but codify the vocabulary you made up: it doesn’t confer any legitimacy. Legitimacy can only come “from the consent of the governed”: from other people adopting your vocabulary.
That said, there is a space for custom XML vocabularies. It’s usually easy to use XHTML instead of creating your own XML tags, but it’s not so easy when you need tags with a lot of custom attributes. In that situation, a custom XML vocabulary makes sense. All I ask is that you seriously think about whether you really need to define a new XML vocabulary for a given problem. It’s possible that in the future, people will err in the opposite direction, and create ad hoc microformats when they shouldn’t. Then I’ll urge caution before creating a microformat. But right now, the problem is too many ad hoc XML vocabularies.
It’s a global world (I actually heard someone say that once), and any service you expose must deal with the products of people who speak different languages from you and use different writing systems. You don’t have to understand all of these languages, but to handle multilingual data without mangling it, you do need to know something about character encodings: the conventions that let us represent human-readable text as strings of bytes.
Every text file you’ve ever created has some character encoding, even though you probably never made a decision about which encoding to use (it’s usually a system property). In the United States the encoding is usually UTF-8, US-ASCII, or Windows-1252. In western Europe it might also be ISO 8859-1. The default for HTML on the web is ISO 8859-1, which is almost but not quite the same as Windows-1252. Japanese documents are commonly encoded with EUC-JP, Shift_JIS, or UTF-8. If you’re curious about what character encodings are used in different places, most web browsers list the encodings they understand. My web browser supports five different encodings for simplified Chinese, five for Hebrew, nine for the Cyrillic alphabet, and so on. Most of these encodings are mutually incompatible, even when they encode the same language. It’s insane!
Fortunately there is a way out of this confusion. We as a species have come up with Unicode, a way of representing every human writing system. Unicode isn’t a character encoding, but there are two good encodings for it: UTF-8 (more efficient for alphabetic languages like English) and UTF-16 (more efficient for logographic languages like Japanese). Either of these encodings can handle text written in any combination of human languages. The best single decision you can make when handling multilingual data is to keep all of your data in one of these encodings: probably UTF-8 unless you live or do a lot of business in east Asia, then maybe UTF-16 with a byte-order mark.
This might be as simple as making a decision when you start the
project, or you may have to convert an existing database. You might
have to install an encoding converter to work on incoming data, or
write encoding detection code. (The Universal Encoding
Detector is an excellent autodetection library for Python.) It’s
got a Ruby port, available as the
chardet gem. It might be easy or difficult.
But once you’re keeping all of this data in one of the Unicode
encodings, most of your problems will be over. When your clients send
you data in a weird encoding, you’ll be able to convert it to your
chosen UTF-* encoding. If they send data that specifies no format at
all, you’ll be able to guess its encoding and convert it, or reject it
The other half of the equation is communicating with your clients: how do you tell them which encoding you’re using in your outgoing representations? Well, XML lets you specify a character encoding on the very first line:
<?xml version="1.0" encoding="UTF-8"?>
All but one of my recommended representation formats is based on XML, so that solves most of the problem. But there is an encoding problem with that one outlier, and there’s a further problem in the relationship between XML and HTTP.
An XML document can and should define a character encoding in
its first line, so that the client will know how to interpret the
document. An HTTP response can and should specify a value for the
Content-Type response header, so
that the client knows it’s being given an XML document and not some
other kind. But the
can also specify a document character encoding with “charset,” and
this encoding might conflict with what it actually says in the
Content-Type: application/xml; charset="ebcdic-fr-297+euro" <?xml version="1.0" encoding="UTF-8"?>
Who wins? Surprisingly, HTTP’s character encoding takes
precedence over the encoding in the document itself.If the document says “UTF-8” and
“ebcdic-fr-297+euro,” then extended French EBCDIC it is. Almost no
one expects this kind of surprise, and most programmers write code
first and check the RFCs later. The result is that the character
encoding, as specified in
Content-Type, tends to be unreliable. Some
servers claim everything they serve is UTF-8, even though the actual
documents say otherwise.
When serving XML documents, I don’t recommend going out of
your way to send a character encoding as part of
Content-type. You can do it if you’re
absolutely sure you’ve got the right encoding, but it won’t do much
good. What’s really important is that you specify a document
encoding. (Technically you can do without a document encoding if
you’re using UTF-8, or UTF-16 with a byte-order mark. But if you
have that much control over the data, you should be able to specify
a document encoding.) If you’re writing a web service client, be
aware that any character encoding specified in
Content-Type may be incorrect. Use common
sense to decide which encoding declaration to believe, rather than
relying on a counterintuitive rule from an RFC a lot of people
Another note: when you serve XML documents, you should serve
them with a media type of
text/xml. If you serve a document
text/xml with no
charset, the correct client
behavior is to totally ignore the encoding specified in the XML
document and interpret the XML document as US-ASCII.Avoid these complications altogether by always serving
always specifying an encoding in the first line of the XML documents
I didn’t mention plain text in my list of recommended representation formats, mostly because plain text is not a structured format, but also because the lack of structure means there’s no way to specify the character encoding of “plain text.” JSON is a way of structuring plain text, but it doesn’t solve the character encoding problem. Fortunately, you don’t have to solve it yourself: just follow the standard convention.
RFC 4627 states that a JSON file must contain Unicode characters, encoded in one of the UTF-* encodings. Practically, this means either UTF-8, or UTF-16 with a byte-order mark. Plain US-ASCII will also work, since ASCII text happens to be valid UTF-8. Given this restriction, a client can determine the character encoding of a JSON document by looking at the first four bytes (the details are in RFC 4627), and there’s no need to specify an explicit encoding. You should follow this convention whenever you serve plain text, not just JSON.
 OpenSearch also defines a simple control flow: a special kind of resource called a “description document.” I’m not covering OpenSearch description documents in this book, mainly for space reasons.
 This is specified, and argued for, in RFC 3023.