Chapter 1. Introducing XPath and XPointer

The XPath and XPointer specifications promulgated by the World Wide Web Consortium (W3C) aim to simplify the location of XML-based content. With software based on those two specs, you’re freed of much of the tedium of finding out if something useful is in a document, so you can simply enjoy the excitement of doing something with it.

Before getting specifically into the details of XPath or XPointer, though, you should have a handle on some concepts and other background the two specs have in common. Don’t worry, the details — and there are enough, it seems, to fill a phone directory (or this book, at least) — are coming.

Why XPath and XPointer?

Detailed answers to the following questions are implicit throughout this book and explicit in a couple of spots:

Why should I care about XPath and XPointer? What do they even do?

To answer them briefly for now, consider even a simple XML document, such as this:

<house_pet_hazards>
   <hazard type="cleanup">
      <name>hairballs</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="cat">Nameless</guilty_party>
      <guilty_party species="cat">Katie</guilty_party>
   </hazard>
   <hazard type="cleanup">
      <name>miscellaneous post-ingestion surprises</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="cat">Katie</guilty_party>
      <guilty_party species="dog">Kianu</guilty_party>
      <guilty_party species="snake">Mephisto</guilty_party>
   </hazard>
   <hazard type="phys_jeopardy">
      <name>underfoot instability</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="snake">Mephisto</guilty_party>
   </hazard>
</house_pet_hazards>

Even so simple a document as this opens the door to dozens of potential questions, from the obvious (“Which pets have been guilty of tripping me up as I walked across the room?”) to the non-obvious, even baroque (“Which species is most likely to cause a problem for me on a given day?” and “For hazards requiring cleanup, is there a correlation between the species and the number of letters in a given pet’s name?”). For real-world XML applications — the ones inspiring you to research XPath/XPointer in the first place — the number of such practical questions might be in the thousands.

XPath provides you with a standard tool for locating the answers to real-world questions — answers contained in an XML document’s content or hidden in its structure. For its part, XPointer (which in part is built on an XPath foundation) provides you with standard mechanisms for creating references to parts of XML documents and using them as addresses.

On a practical level, if you know and become comfortable with XPath, you’ll have prepared yourself for easy use not only of XPointer but also of numerous other XML-related specifications, notably Extensible Stylesheet Language Transformations (XSLT) and XQuery. Knowing XPointer provides you with a key to a smaller castle (the XLink standard for advanced hyperlinking capabilities within or among portions of documents) but without that key the door is barred.

Antecedents/History

An interesting portion of many W3C specs is the list of non-normative (or simply “other”) references at the end. After wading through all the dry prose whose overarching purpose is the removal of ambiguity (sometimes at the expense of clarity and terseness), in this section you get to peek into the minds and personalities of the specs’ authors. (The “non-normative” says, in effect, that the resources listed here aren’t required reading — although they may have profoundly affected the authors’ own thinking about the subject.)

The XPath specification’s “other references,” without exception, are other formally published standards from the W3C or other (quasi-)official institutions. But XPath, as you will see, is a full-blown standard (the W3C refers to these as “recommendations”). XPointer is still a bit ragged around the edges at the time of this writing, and its non-normative references (Appendix A.2 of the XPointer xpointer( ) Scheme) are consequently more revealing of the background. This is especially useful, because there is some overlap in the membership of the W3C Working Groups (WGs) that produced XPointer and XPath.

Following is a brief look at a few of the most influential historical antecedents for XPath and XPointer.

DSSSL

The Document Style Semantics and Specification Language (DSSSL) was developed as a means of defining the presentation characteristics of SGML documents. Based syntactically on a programming language called Scheme, DSSSL does for SGML roughly what XSLT does for XML: it identifies for a DSSSL processor portions of the structure of an input document and how to behave once those portions are located.

Of particular interest in relation to this book’s subject matter is DSSSL’s core query language. This is the portion of a DSSSL instruction that locates content of a particular kind in an SGML document. For instance:

(element bottle
   [...instructions...])

tells the processor to follow the steps outlined in [...instructions...] for each occurrence of a bottle element in the source document. You can also navigate to various portions of the source document based on context. For example, the following starts with the current node (the portion of the source document with which the processor is currently working) to locate the nearest packaging ancestor:

(ancestor packaging (current-node)
   [...instructions...])

An ancestor is the parent of a given node, or that parent’s parent, and so on up the tree of nodes to the document root. The concepts of a tree of nodes, ancestors, children, and the like all made their way eventually into XPath.

XSL

In August 1997, even before XML 1.0 became a W3C Recommendation itself, the W3C received a first stab at a language for describing how an XML documents contents should be displayed, such as in a web browser. The initial proposal called for the creation of the Extensible Stylesheet Language (XSL). The W3C began work on its own version of XSL in mid-1998, and the complete XSL only reached Recommendation status in October 2001. Along the way, its editors recognized its complex nature: like DSSSL, XSL included both a language for locating content in a source document and a language for describing processor behavior upon locating that content.

The principal editor of the XSL specification was James Clark, who had previously developed the widely used Jade DSSSL processor. Unsurprisingly, then, XSL could be characterized as a DSSSL wolf in an XML sheep’s clothing. Taken together, the specification of which portion of the source tree an instruction referred to, and the instruction itself, were referred to as construction rules. The implication of this term was that for a given bit of source tree content, the XSL stylesheet would construct a particular result. A simple XSL construction rule might look something like this:

<rule>
   <target-element type="bottle"/>
   <p font-size="12pt">
      <children/>
   </p>
</rule>

The XSL processor would, for each occurrence of a bottle element in the source tree, construct a resulting p element with the indicated type attribute, then the processor would proceed to handle any children of that p element. (Elsewhere in the stylesheet, presumably, would be construction rules describing what to do with these children.)

One problem with XSL, as you can see above, is that it indiscriminately mixed elements from its own vocabulary (such as rule, target-element, and children) with those from the resulting documents (p, in this example). This was a perfect case for the use of namespaces, which XSL integrated when that specification was ready.

XSL went through a couple of Working Draft iterations before a light bulb went on over the editors heads: the ability to locate content in an XML source tree fit a general-purpose need, not only for XSL transformations from source to result but also for other applications, such as XPointer and eventually XQuery. The W3C eventually split the original XSL project into XSLT and XSL-Formatting Objects (XSL-FO, covered in the main XSL specification), and XPath emerged as a separate entity from XSLT soon after. XSLT and XPath reached Recommendation status in late 1999, well ahead of the rest of XSL.

TEI

The venerable and influential Text Encoding Initiative (TEI) first appeared in 1994 as a joint product of three professional/academic bodies: the Association for Computers and the Humanities (ACH), the Association for Computational Linguistics (ACL), and the Association for Literary and Linguistic Computing (ALLC).

Tip

An authoritative list of references on the TEI is provided at http://www.uic.edu/orgs/tei. As one of the resources there notes, the 1994 publication of “Guidelines for Text Encoding and Interchange” followed five years of work — venerable indeed.

TEI’s main product was a series of several hundred “textual feature definitions” in the form of extensible SGML elements and attributes. With some exceptions, these SGML-based features are readily understandable by anyone familiar with XML DTDs. Among the supplementary tagsets provided is a group whose purpose is to establish links from one portion of an SGML document to another within the same document or from one SGML document to a completely separate one. (If this already sounds familiar, no surprise there: these concepts later were carried over not just to the relatively recent XPath and XPointer, but much earlier to HTML itself.)

Particularly important for XPath and XPointer was TEI’s notion of extended pointers. A regular TEI link or cross-reference depended on such language features as the SGML equivalent of XML’s ID- and IDREF-type attributes for its operation. Extended pointers went further, permitting you to locate content on the basis of the content’s markup structure. As a TEI tutorial on “Cross-References and Links” (at http://www.tei-c.org/Lite/U5-ptrs.html) puts it:

In this language, locations are defined as a series of steps, each one identifying some part of the document, often in terms of the locations identified by the previous step. For example, you would point to the third sentence of the second paragraph of chapter two by selecting chapter two in the first step, the second paragraph in the second step, and the third sentence in the last step. A step can be defined in terms of SGML concepts (such as parent, descendent, preceding, etc.) or, more loosely, in terms of text patterns, word, or character positions.

Without this essential concept, it’s doubtful that XPath and XPointer would have emerged in the form they ultimately adopted.

Tip

Note that the most specific form of HTML linking possible depends on the presence of named targets in the resource to which you’re linking. The smartest HTML link doesn’t have any intelligence remotely like that described in the above quotation.

Intermedia

Even before work began on the TEI Guidelines, various individuals at Brown University had been exploring the possibilities of what they called hypertext. (The term itself was coined in the 1960s by Ted Nelson, who by 1968 was an instructor at Brown.) In 1988, the group published “Intermedia: The Concept and the Construction of a Seamless Information Environment” in the professional journal IEEE Computer.

Intermedia was an ambitious research project that came, in time, to include such features as text and graphics editors, a timeline editor, and so on. One of its crucial features was dubbed the “Web view.” (Remember, this was in the mid- to late 1980s. A capital-W Web existed in almost no one else’s mind at the time.)

The thorny problem that Intermedia’s Web view attempted to tackle was the possibility of becoming “lost in hyperspace.” As the number of hypertext documents (and the points within them) multiplied, the number of possible links among them quickly grew out of control — to the point of unintelligibility.

The Web view’s seminal contribution to the future of hypertext media — certainly as codified in XPath and XPointer — was its provision for considering only the local context. Instead of trying to deal with all possible links from a given point to all other points, this local map view of the hypertext world allowed you to focus on a single (albeit constantly shifting) path: start at A, then proceed to B (which shares some relationship with A), then to C, and so on. As you will see by the end of this book, while concentrating on individual paths causes you to lose sight of the “big picture,” it also enables you to get from any given point to any other. (Tellingly, Intermedia itself eventually dropped support for the big-picture “global maps,” having learned they were so complicated that no one wanted to use them anyway.)

XPath, XPointer, and Other XML-Related Specs

It’s highly unlikely, if you’re at the point of wanting to learn about XPath and XPointer, that you’ll be surprised by one ugly reality: everything in XML seems to hinge on everything else. For a brief period of a year or two, you could pick up a couple of general-purpose books on XML and learn everything you needed to know about the subject; that time is long gone.

So let’s pretend that XML as a whole is represented graphically by one of Intermedia’s global maps. It’s a mess, isn’t it? There’s no way to figure it all out, even if by “it” you just mean that part of it relating to XPath and XPointer — or so it seems. But let’s narrow the focus a bit, following the Intermedia Web view’s local-map approach.

Let’s start with XPath. Successfully getting your mind around XPath currently requires that you have some knowledge of XML itself (including such occasionally overlooked little dark corners as ID-type attributes and whitespace handling). It also requires that you “get” at least the rudiments of XML namespaces.[1]

XPointer is a bit more complicated. First, it’s built principally on an XPath foundation. While it’s possible to use XPointer with no knowledge at all of XPath, the range of applications in which you can do so is quite limited.

Second, XPointers themselves are used by the XLink standard for linking from one XML resource to another (in the same or a different document). You can come to understand how to use XPointers quite completely without ever actually using them, and hence without any working knowledge of XLink; nonetheless, an elementary grasp of at least basic XLink terminology and syntax is necessary for true understanding.

Third, a couple of XML-related standards — XML Base and the XML Infoset — are referenced by the XPointer spec but don’t require that you understand much about them to effectively use XPointer.

Finally, as you will see, an ability to use XPointer depends to a certain extent on a number of non-XML standards (particularly, Internet media types, URIs, and character encodings).

Tip

Don’t panic; I’ll cover what you need to know of these more-obscure standards when the need arises.

In short, the route to XPath and XPointer mastery might look something like Figure 1-1.

Interdependencies among XML-related standards
Figure 1-1. Interdependencies among XML-related standards

In this diagram, the connections you really have to be concerned with are the ones depicted with solid lines; the connections — and the one box — depicted with dashed lines will be of less critical concern.

Specs Dependent on XPath and XPointer

The other side — not what you need to know to use XPath and XPointer, but what you need to know XPath and XPointer for — is rich. (One of this book’s early reviewers said that she gets “quite excited” by the range. I’m not sure I’d go that far, but I take her point.) Here’s a sampling.

First, XPath. As you already know from what I’ve covered, you can use XPath to leverage yourself into practical use of XSLT, XPointer, and XQuery. XPath syntax is also used in the following standards, which need to refer to portions of XML documents:

XPointer is more of a special-purpose tool than XPath and its range of usefulness is therefore narrower. You already know about its usefulness to XLink. However, XPointer is also at the heart of the XInclude spec for incorporating fragments of one document within another. You can find the current version of XInclude at http://www.w3.org/TR/xinclude/.

XPath and XPointer Versus XQuery

To get one other important question out of the way immediately: XPath and XPointer are not XQuery. The latter is a recent addition to the (rather crowded) gallery of the W3C’s XML-related standards. Its purpose is to provide to XML-based data stores some (ideally all) of the advantages of Structured Query Language (SQL) in the relational-database world. In SQL, a very simple query might look something like this:

SELECT emp_id, emp_name
FROM emp_table
WHERE emp_id = "73519"

As you can see, this comprises a straightforward mix of SQL keywords (shown here in uppercase), the names of relational tables and fields, operators (such as the equals sign), and literal values (such as 73519). The result of “running” such a query is a list, in table form (that is, rows and columns), of data values.

The XQuery form of the above SQL query might look as follows (note in particular the relationship between the above WHERE clause and the boldfaced portion of the XQuery query):

{for $b in document("emp_table.xml")//employee[emp_id = "73519"]
   return
      <p>{ emp_id }{ emp_name }</p>
}

The result of “running” this query is a well-formed XML document or document fragment, such as:

<p>
   <emp_id>73519</emp_id>
   <emp_name>DeGaulle,Charles</emp_name>
</p>

XQuery is still wending its way through the sometimes-tortuous route prescribed for all W3C specifications; at the time of this writing, it’s still a Working Draft, dated April 2002. A number of controversies swirl about it. First is that, while its equivalent of the SQL WHERE clause is based on XPath, it’s not quite XPath as you will come to understand it. (The XPath-based portion of the above XQuery statement is in boldface.) Second, XQuery’s approach to returning an XML result from an XML source conflicts with the approach taken by the XSLT spec for the same purpose. And third is the XQuery syntax itself, which though vaguely resembling XML,[2] is not exactly XML. The “meaning” of an XQuery query is bound up not in elements and attributes but in special element text content delimited by curly braces (the { and } characters).

Now, there are valid reasons for not using pure XML syntax in general-purpose languages, such as XQuery and (as you will see) XPath and XPointer. Chief among these reasons — the reason why these specs’ authors almost always drop the use of purely XML-based syntax after first considering it — is that the verbosity is overwhelming. For instance, the W3C has prepared a Working Draft version (dated, as of this writing, June 2001) of something called XQueryX: a purely XML syntax representation of XQuery queries. Section 3 of this document provides examples of XQuery queries and their XQueryX counterparts; a typical XQuery query takes up seven lines, while the equivalent XQueryX form is 57 lines long.

Tip

If you’re interested in seeing some of these rather gruesome (in my opinion) examples for yourself, you can find the current version of the XQueryX standard at http://www.w3.org/TR/xqueryx.

Another problem with using purely XML syntax for general-purpose applications is namespaces. If queries (or path/pointer language expressions) had to use XML syntax, they’d need to include namespace qualifications to distinguish the queries, paths, and pointers from the surrounding document’s content, greatly increasing the complexity of any document that needed to use them. That’s why XPath and XPointer expressions are served up in attribute values and why XQuery’s counterparts appear in element content.

I don’t mean to imply here, as you will see, that you can ignore namespace issues in constructing path and pointer expressions. For instance, if you wish to locate an element with a particular name in a document, you must still carry — at least in the back of your head — the question, “Do I mean the name and its namespace prefix, if one, or just the name itself?” My point here relates strictly to the syntax of the general-purpose “querying” language itself. That said, XQuery’s use of specially delimited and formatted element content seems to me to fly in the face of XML’s classic emphasis on supplying meaning via markup (as opposed to embedding it in text strings outside the markup), in not entirely satisfactory ways.



[1] Understanding certain XPath features seems to presume familiarity with such non-XML issues as how computers perform floating-point arithmetic and the dozens of ways in which legitimate Uniform Resource Identifiers (URIs) may be formed. I’d argue, though, that you don’t need an intimate, profound familiarity with those issues — just some common sense.

[2] For example, the XQuery snippet here includes a <p> and </p> start tag/end tag pair.

Get XPath and XPointer now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.