Why Not to Choose SAX?

No API solves problems by itself, and SAX avoided the kitchen sink syndrome better than many others. So there are times it will be clear SAX isn’t the whole answer for some particular application-processing stage, even when you have the option to choose it. It will often still be the right way to get data into or out of another processing stage, particularly since many other APIs can interface with SAX. Also, building custom data import/export tools with SAX is fairly easy.

Probably the biggest single issue with SAX is that by itself it doesn’t provide random access to XML data. Its event stream is “forward-only”: you can’t go backwards or reorder it without your own record of the events. Such data structure policy would be handled by application layers on top of SAX, and you’ll need such layers if you use random access models such as XPath. Typically, applications use SAX to construct data structures that are either customized for their particular random access requirements or generic (typically DOM-like). You might create Person objects and index them by name, perhaps in some sort of hash table or using some kind of database as a backing store. In some applications it’s acceptable to just re-scan small to midsize XML documents on demand; it can be inexpensive when modern operating systems have already cached the data.

If you’re looking for an API that helps you write a low-level XML text editor and lets you work with malformed XML while it preserves semantically meaningless information,[2] then SAX isn’t what you want. Similarly, parsing less than an entire XML document isn’t standardized by SAX (or by the XML specification). Such processing requires an API that works at the level of potentially malformed tokens. SAX (and any other application programming interface not targeted at text editors) makes hiding such details a primary goal. SAX works well for “structural” editors, which prevent creation of malformed XML and hide semantically meaningless information.

It’s important to note that SAX is intentionally limited. It’s the core of a library of XML support, and that “S” in its name really does mean “simple”; complex functionality is for layers on top of SAX and is not part of SAX itself. Even basic facilities like XML text output (printing) are layered over SAX. While open source code to handle such functions is often available on the Internet, you may still need to find and choose between such libraries. SAX is somewhat of a “close to the metal” low-level API, though it’s more flexible than most such APIs.



[2] For example, whitespace outside element content, attribute order, or singly versus doubly quoted strings.

Get SAX2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.