Other Kinds of SAX2 Event Producers

Normally, an XMLReader turns XML text into SAX event callbacks. This book encourages you to think of those event consumer callbacks as the most important part of the process, so using XML text as input is just one option for feeding those consumers.

For example, some SAX parsers have turned HTML text into SAX callbacks; there have even been SAX wrappers around the limited javax.swing.text.html parser. These wrappers can help migrate to XHTML, first by making sure tags are properly formed, paired, and nested, then by helping make the XHTML be valid so more tools can work with it. Malformed HTML is a huge problem; there’s lots of brain-dead HTML text on the Web.[18] In practice, no generally available SAX HTML parser is quite good enough to substitute for tools like HTML Tidy (see http://tidy.sourceforge.net) combined with manual fixup for problem cases, but that could change.

DOM-to-SAX Event Production (and DOM4J, JDOM)

It’s so typical to want to turn a DOM node into a series of SAX events that SAX2 defined a standard way to do this. Several of the projects that claim to improve on DOM by being more Java-friendly, such as DOM4J and JDOM, have similar functionality.

In conjunction with any sort of SAX text output API (such as an XMLWriter), this technique is an easy way to turn a DOM tree into text. Utilities to turn a DOM node into text all need to do more or less the same thing: traverse the tree and emit the right sort of text. Using ...

Get SAX2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.