Path Expressions

The most straightforward kind of query simply selects elements or attributes from an input document. This type of query is known as a path expression. For example, the path expression:

doc("catalog.xml")/catalog/product

will select all the product elements from the catalog.xml document.

Path expressions are used to traverse an XML tree to select elements and attributes of interest. They are similar to paths used for filenames in many operating systems. They consist of a series of steps, separated by slashes, that traverse the elements and attributes in the XML documents. In this example, there are three steps:

  1. doc("catalog.xml") calls an XQuery function named doc, passing it the name of the file to open

  2. catalog selects the catalog element, the outermost element of the document

  3. product selects all the product children of catalog

The result of the query will be the four product elements, exactly as they appear (with the same attributes and contents) in the input document. Example 1-4 shows the complete result.

Example 1-4. Four product elements selected from the catalog

  <product dept="WMN">
    <number>557</number>
    <name language="en">Fleece Pullover</name>
    <colorChoices>navy black</colorChoices>
  </product>
  <product dept="ACC">
    <number>563</number>
    <name language="en">Floppy Sun Hat</name>
  </product>
  <product dept="ACC">
    <number>443</number>
    <name language="en">Deluxe Travel Bag</name>
  </product>
  <product dept="MEN">
    <number>784</number>
    <name language="en">Cotton Dress Shirt</name>
    <colorChoices>white gray</colorChoices>
    <desc>Our <i>favorite</i> shirt!</desc>
  </product>

Path expressions can also return attributes, using the @ symbol. For example, the path expression:

doc("catalog.xml")/*/product/@dept

will return the four dept attributes in the input document. The asterisk (*) can be used as a wildcard to indicate any element name. In this example, the path will return any product children of the outermost element, regardless of the outermost element's name. Alternatively, you can use a double slash (//) to return product elements that appear anywhere in the catalog document, as in:

doc("catalog.xml")//product/@dept

In addition to traversing the XML document, a path expression can contain predicates that filter out elements or attributes that do not meet a particular criterion. Predicates are indicated by square brackets. For example, the path expression:

doc("catalog.xml")/catalog/product[@dept = "ACC"]

contains a predicate. It selects only those product elements whose dept attribute value is ACC.

When a predicate contains a number, it serves as an index. For example:

doc("catalog.xml")/catalog/product[2]

will return the second product element in the catalog.

Path expressions are convenient because of their compact, easy-to-remember syntax. However, they have a limitation: they can only return elements and attributes as they appear in input documents. Any elements selected in a path expression appear in the results with the same names, the same attributes and contents, and in the same order as in the input document. When you select the product elements, you get them with all of their children and with their dept attributes. Path expressions are covered in detail in Chapter 4.

Get XQuery now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.