You want to access an XML document in a fast stream. Your dataset is too large for DOM, and you want a more selective API than SAX offers.
Use the StAX API in Java SE 6 to “pull” parse your document.
Java has given us a number of ways to work with XML documents, including the popular DOM and SAX. The most recent addition is StAX, or Streaming API for XML, which is largely the brainchild of Oracle/BEA. While all three of these methods of parsing XML have advantages, they have shortcomings too.
StAX is currently the most efficient method of dealing with XML, and is therefore particularly well suited to working with complex processes such as data binding and SOAP messages. Oracle/BEA’s WebLogic 9 and 10 use this parser internally within the application server, as does Glassfish v2.
DOM offers an easy-to-use API, and has an advantage over SAX and StAX in that it is XPath-capable. But it also forces you to read the entire document into memory. This is fine for small documents, but can damage performance for sizeable documents, and can be ultimately prohibitive for very large documents. One European bank network regularly transfers multi-gigabyte XML files within their SOA; they’re not using DOM to deal with it.
SAX, on the other hand, handles this problem by working as a “push” parser; that is, events are generated for each structure the parser encounters within the document, and the programmer can choose to deal with those he’s interested ...