How Do I Use It?

All of the great ideas XML has brought to us are not much use without some tools to use these ideas within our familiar programming environments. Luckily, XML has been paired with Java since its inception, and Java boasts the most complete set of APIs available to allow use of XML directly within Java code. While C, C++, and Perl are quickly catching up, Java continues to set the standard on how to use XML from applications. There are two basic stages that occur in an XML document’s lifecycle from an application point of view, as shown in Figure 1.1. First, the document is parsed, and then the data within it is manipulated.

The application view of an XML document lifecycle

Figure 1-1. The application view of an XML document lifecycle

As Java developers, we are fortunate to have simple ways to handle these tasks and more.

SAX

SAX is the Simple API for XML. It provides an event-based framework for parsing XML data, which is the process of reading through the document and breaking down the data into usable parts; at each step of the way, SAX defines events that can occur. For example, SAX defines an org.xml.sax.ContentHandler interface that defines methods such as startDocument( ) and endElement( ). Implementing this interface allows complete control over these portions of the XML parsing process. There is a similar interface for handling errors and lexical constructs. A set of errors and warnings is defined, allowing handling of the various situations that can occur in XML parsing, such as an invalid document, or one that is not well-formed. Behavior can be added to customize the parsing process, resulting in very application-specific tasks being available for definition, all with a standard interface into XML documents. For the SAX API documentation and other information on SAX, visit http://www.megginson.com/SAX.

Before continuing, it is important to clear up a common misconception about SAX. SAX is often mistaken for an XML parser. We even discuss SAX here as providing a means to parse XML data. However, SAX provides a framework for parsers to use, and defines events within the parsing process to monitor. A parser must be supplied to SAX to perform any XML parsing. This has resulted in many excellent parsers being made available in Java, such as Sun’s Project X, the Apache Software Foundation’s Xerces, Oracle’s XML Parser, and IBM’s XML4J. These can all be plugged into the SAX APIs and result in parsed XML data. SAX APIs provide the means to parse a document, not the XML parser itself.

DOM

DOM is an API for the Document Object Model. While SAX only provides access to the data within an XML document, DOM is designed to provide a means of manipulating that data. DOM provides a representation of an XML document as a tree. Because a tree is an age-old data representation, traversal and manipulation of tree structures are easy to accomplish in programming languages, Java being no exception. DOM also reads an entire XML document into memory, storing all the data in nodes, so the entire document is very fast to access; it is all in memory for the length of its existence in the DOM tree. Each node represents a piece of the data pulled from the original document.

There is a significant drawback to DOM, however. Because DOM reads an entire document into memory, resources can become very heavily taxed, often slowing down or even crippling an application. The larger and more complex the document, the more pronounced this performance degradation becomes. Keep in mind that while DOM is a good, prevalent means of manipulating XML data, it is not the only means of accomplishing this task. We will spend time using DOM, and we will also write code that manipulates data straight from SAX. Your application requirements will most likely define which solution is correct for your specific development project. To read the DOM recommendations at W3C, go to http://www.w3.org/DOM in your web browser.

JAXP

JAXP is Sun’s Java API for XML Parsing. A relatively new addition to the XML developer’s arsenal, it attempts to provide cohesiveness to the SAX and DOM APIs. While it does not compete with or replace either of these APIs, it does add some convenience methods to try to make the XML APIs easier to use for Java developers. It conforms to the SAX and DOM specifications, as well as adhering to the namespace Recommendation we discussed earlier. JAXP does not redefine SAX or DOM behavior, but ensures that all XML-conformant parsers can be accessed within Java applications through a standard pluggability layer.

It is expected that JAXP will continue to evolve as both SAX and DOM go through revision. It is also assumed that JAXP will eventually be part of other Sun specifications, as both the Tomcat servlet engine and the EJB 1.1 specification require XML-formatted configuration and deployment files. Although the J2EE™ 1.3 and J2SE™ 1.4 specifications do not mention JAXP explicitly, they are expected to have integrated JAXP support as well. For the complete JAXP specification, go to http://java.sun.com/xml .

These three APIs make up the Java developers toolkit for handling XML. While this is not a formal designation, these three APIs do provide us the mechanism to get XML data and manipulate it, all within normal Java code. These APIs will be our workhorses throughout the book, and we will learn to use every aspect of the classes that each provides.

Get Java and XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.