Other Options

As XML has spread, more and more people have had creative (and often useful) ideas about how to process it.

XPath as API

The XPath language provides a convenient method to specify which nodes to return in a tree context. A parser written as a hybrid will only need to return a list of nodes that match an XPath expression. A stream parser efficiently searches through the document to find the nodes, then passes the locations to a tree builder that assembles them into object trees. XPath’s advantage is that it is has a very rich language for specifying nodes, giving the developer a lot of control and flexibility. The parsers libxml2 and MSXML are two that come with XPath interfaces.

JDOM

Despite the name, JDOM is not merely a Java implementation of DOM. Rather, it is an alternative to SAX and DOM that is described by its developers as “lightweight and fast . . . optimized for the Java programmer.” It doesn’t actually replace other parsers, but uses them to build object representations of documents with an interface that is easy to manipulate. It is designed to integrate with SAX and DOM, supplying a simple and useful interface layer on top.

The proponents of JDOM say it is needed to reduce the complexity of the factory-based specifications for SAX and DOM. For that reason, the JDOM specification itself is defined with classes and not interfaces. In addition to substituting its own new API, JDOM includes the fabulous XPath API.

Hybrids

If streams and trees are the two extremes ...

Get Learning XML, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.