Input Documents

A single query can access many input documents. The term input document is used in this book to mean any XML data that is being queried. Technically, it might not be an entire XML document; it might be a document fragment, such as an element or sequence of elements, possibly with children. Alternatively, it might not be a physical XML file at all; it might be data retrieved from an XML database, or an in-memory XML representation that was generated from non-XML data.

If the input document is physically stored in XML syntax, it must be well-formed XML. This means that it must comply with XML syntax rules, such as that every start tag has an end tag, there is no overlap among elements, and special characters are used appropriately. It must also use namespaces appropriately. This means that if colons are used in element or attribute names, the part before the colon must be a prefix that is mapped to a namespace using a namespace declaration.

Whether it is physically stored as an XML document or not, an input document must conform to other constraints on XML documents. For example, an element may not have two attributes with the same name, and element and attribute names may not contain special characters other than dashes, underscores, and periods.

There are four ways that input documents could be accessed from within a query. They are described in the next four sections.

Accessing a Single Document

The doc function can be used to open one input document based on its URI. It takes as an argument a single URI as a string, and returns the document node of the resource associated with the specified URI.

Implementations interpret the URI passed to the doc function in different ways. Some, like Saxon, will dereference the URI, that is, go out to the URL and retrieve the resource at that location. For example, using Saxon:

doc("http://datypic.com/order.xml")

will return the document node of the document that can be found at the URL http://datypic.com/order.xml.

Other implementations, such as those embedded in XML databases, consider the URIs to be just names. The processor might take the name and look it up in an internal catalog to find the document associated with that name. The doc function is covered in detail in Appendix A.

Accessing a Collection

The collection function returns the nodes that make up a collection. A collection may be a sequence of nodes of any kind, identified by a URI. Exactly how the URI is associated with the nodes is defined by the implementation. For example, one implementation might accept a URI that is the name of a directory on a filesystem, and return the document nodes of the XML documents stored in that directory. Another implementation might associate a URI with a particular database. A third might allow you to specify the URI of an XML document that contains URIs for all the XML documents in the collection.

The function takes as an argument a single URI. For example, the function call:

collection("http://datypic.com/orders")

might return all the document nodes of the XML documents associated with the collection http://datypic.com/orders. It is also possible to use the function without any parameters, as in collection( ), to retrieve a default collection as defined by the implementation.

Important

Some XQuery implementations support a function called input, with no arguments. This function appeared in earlier drafts of the XQuery recommendation but is no longer part of the standard. It is equivalent to calling the collection function with no arguments.

Setting the Context Node Outside the Query

The context node can be set by the processor outside the query. In this case, it may not be necessary to use the doc or collection functions, unless you want to open secondary data sources.

For example, a hypothetical XQuery implementation might allow you to set the context node in the Java code that executes the query, as in:

Document catalogDocument = new Document(File("catalog.xml"));
String query = "catalog/product[@dept = 'ACC']";
List productElements = catalogDocument.evaluateXQuery(query);

In that case, the XQuery expression catalog/product might be evaluated in the context of the catalog document node itself. If the processor had not set the context node, a path expression starting with catalog/product would not be valid.

Another implementation might allow you to select a document to query in a user interface, in which case it uses that document as the context node.

Using Variables

The processor can bind external variables to input documents or document fragments. These variables can then be used in the query to access the input document. For example, an implementation might allow an external variable named $input to be defined, and allow the user to specify a document to be associated with that variable. The hypothetical query processor could be invoked from the command line using:

xquery -input catalog.xml

and the query could use expressions like $input/catalog/product to retrieve the product elements. The name $input is provided as an example; the implementation could use any name for the variable.

You should consult the documentation for your XQuery implementation to determine which of these four methods are appropriate for accessing input documents.

Get XQuery now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.