Chapter 1. Introduction to XQuery

This chapter provides background on the purpose and capabilities of XQuery. It also gives a quick introduction to the features of XQuery that are covered in more detail later in the book. It is designed to provide a basic familiarity with the most commonly used kinds of expressions, without getting too bogged down in the details.

What Is XQuery?

The use of XML has exploded in recent years. An enormous amount of information is now stored in XML, both in XML databases and in documents on a filesystem. This includes highly structured data, such as sales figures, semistructured data such as product catalogs and yellow pages, and relatively unstructured data such as letters and books. Even more information is passed between systems as transitory XML documents.

All of this data is used for a variety of purposes. For example, sales figures may be useful for compiling financial statements that may be published on the Web, reporting results to the tax authorities, calculating bonuses for salespeople, or creating internal reports for planning. For each of these uses, we are interested in different elements of the data and expect it to be formatted and transformed according to our needs.

XQuery is a query language designed by the W3C to address these needs. It allows you to select the XML data elements of interest, reorganize and possibly transform them, and return the results in a structure of your choosing.

Capabilities of XQuery

XQuery has a rich set of features that allow many different types of operations on XML data and documents, including:

  • Selecting information based on specific criteria

  • Filtering out unwanted information

  • Searching for information within a document or set of documents

  • Joining data from multiple documents or collections of documents

  • Sorting, grouping, and aggregating data

  • Transforming and restructuring XML data into another XML vocabulary or structure

  • Performing arithmetic calculations on numbers and dates

  • Manipulating strings to reformat text

As you can see, XQuery can be used not just to extract sections of XML documents, but also to manipulate and transform the results. One capability that XQuery 1.0 does not provide is updates, which would be particularly useful in the case of XML data stored in databases. This is under development for a future version of XQuery.

Uses for XQuery

There are as many reasons to query XML as there are reasons to use XML. Some examples of common uses for the XQuery language are:

  • Extracting information from a relational database for use in a web service

  • Generating reports on data stored in a database for presentation on the Web as XHTML

  • Searching textual documents in a native XML database and presenting the results

  • Pulling data from databases or packaged software and transforming it for application integration

  • Combining content from traditionally non-XML sources to implement content management and delivery

  • Ad hoc querying of standalone XML documents for the purposes of testing or research

Processing Scenarios

XQuery's sweet spot is querying bodies of XML content that are stored in databases. For this reason, it is sometimes called the "SQL of XML." Some of the earliest XQuery implementations were in native XML database products. The term "native XML database" generally refers to a database that is designed for XML content from the ground up, as opposed to a traditionally relational database. Rather than being oriented around tables and columns, its data model is based on hierarchical documents and collections of documents.

Native XML databases are most often used for narrative content and other data that is less predictable than what you would typically store in a relational database. Examples of native XML database products that support XQuery are Berkeley DB XML, eXist (which is open source), MarkLogic Server, TigerLogic XDMS, and X-Hive/DB. These products provide the traditional capabilities of databases, such as data storage, indexing, querying, loading, extracting, backup, and recovery. Most of them also provide some added value in addition to their database capabilities. For example, they might provide advanced full-text searching functionality, document conversion services, or end-user interfaces.

Major relational database products, including Oracle 10g, IBM DB2 9, and Microsoft SQL Server 2005, also have support for XML and XQuery. Early implementations of XML in relational databases involved storing XML in table columns as blobs or character strings and providing query access to those columns. However, these vendors are increasingly blurring the line between native XML databases and relational databases with new features that allow you to store XML natively.

Other XQuery processors are not embedded in a database product, but work independently. They might be used on physical XML documents stored as files on a file system or on the Web. They might also operate on XML data that is passed in memory from some other process. The most notable product in this category is Saxon, which has both open source and commercial versions.

Get XQuery now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.