Chapter 20. Simple API for XML (SAX)

The Simple API for XML (SAX) is an event-based API for reading XML documents. Many different XML parsers implement the SAX API, including Xerces, Crimson, the Oracle XML Parser for Java, and Ælfred. SAX was originally defined as a Java API and is primarily intended for parsers written in Java. Therefore, this chapter focuses on the Java version of the API. However, SAX has been ported to most other major object-oriented languages, including C++, Python, Perl, and Eiffel. The translation from Java is usually fairly obvious.

The SAX API is unusual among XML APIs because it’s an event-based push model rather than a tree-based pull model. As the XML parser reads an XML document, it sends the program information from the document in real time. Each time the parser sees a start-tag, an end-tag, character data, or a processing instruction, it tells your program. The document is presented to your program one piece at a time from beginning to end. You can either save the pieces you’re interested in until the entire document has been read, or process the information as soon as you receive it. You do not have to wait for the entire document to be read before acting on the data at the beginning of the document. Most importantly, the entire document does not have to reside in memory. This feature makes SAX the API of choice for very large documents that do not fit into available memory.

Tip

This chapter covers SAX2 exclusively. In 2004, all major parsers that ...

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.