Pull Parsing

Tim Bray, lead editor of XML 1.0, calls pull parsing “the way to go in the future.” Like event-based parsing, it’s fast, memory efficient, streamable, and read-only. The difference is in how the application and parser interact. SAX implements what we call push parsing. The parser pushes events at the program, requiring it to react. The parser doesn’t store any state information , contextual clues that would help in decisions for how to parse, so the application has to store this information itself.

Pull parsing is just the opposite. The program takes control and tells the parser when to fetch the next item. Instead of reacting to events, it proactively seeks out events. This allows the developer more freedom in designing data handlers, and greater ability to catch invalid markup. Consider the following example XML:

<catalog>
  <product id="ronco-728">
    <name>Widget</name>
    <price>19.99</price>
  </product>
  <product id="acme-229">
    <name>Gizmo</name>
    <price>28.98</price>
  </product>
</catalog>

It is easy to write a SAX program to read this XML and build a data structure. The following code assembles an array of products composed of instances of this class:

class Product {
    String name;
    String price;
}

Here is the code to do it:

StringBuffer cdata = new StringBuffer(); Product[] catalog = new Product[10]; String name; Float price; public void startDocument () { index = 0; } public void startElement( String uri, String local, String raw, Attributes attrs ) throws SAXException { ...

Get Learning XML, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.