Tree-based XML Processing

XML documents, because of the requirements for well-formedness, can be readily described using tree structures. Elements are inherently hierarchical, as they may contain other elements, text content, comments, and so forth.

There is a wide variety of tree models for XML documents. XPath (described in Chapter 9), used in XSLT transformations, has a slightly different set of expectations than does the Document Object Model (DOM) API, which is also different from the XML Information Set (Infoset), another W3C project. XML Schema (described in Chapter 17 and Chapter 22) defines a Post-Schema Validation Infoset (PSVI), which has more information in it (derived from the XML Schema) than any of the others.

Developers who want to manipulate documents from their programs typically use APIs that provide access to an object model representing the XML document. Tree-based APIs typically present a model of an entire document to an application once parsing has successfully concluded. Applications don’t have to worry about manually maintaining parsing context or partial processing when a parse error is encountered, as the tree-based parser generally handles errors on its own. Rather than following a stream of events, an application can just navigate through the tree to find the desired pieces of a document.

Working with a tree model has substantial advantages. The entire document is always available, and moving well-balanced portions of a document from one place to another ...

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.