Event Consumer Issues

The primary Infoset concern for SAX2 event consumers is to understand how the stream of events represents the information structures used in the Infoset. Applications need to track some state if they need access to some of those structures or random access to anything. It’s typical to track only a few items, and ignore the rest as being incidental background noise. Streaming processing discards items as soon as possible.

You really shouldn’t care, but since the String datatype can’t handle more than two gigabytes of data, and strings are used to pass certain document data to applications, there’s a chance that some documents could cause trouble by overflowing that limit. If you encounter such a document, consult a pathologist. There really isn’t much you can do about this.

Structural Issues

The [children] properties are arbitrarily sized, ordered sequences of information items, which are presented in document order by SAX2 event callbacks. Most other information items are not ordered, such as [notations], [unparsed entities], and [attributes] properties. Only [children] properties would need to be stored in order-preserving data structures.

While most information items are provided through a single callback, some of the more complex ones involve matched, and (except in one case) cleanly nested, pairs of calls to start() and end() the item. Such items include the Document itself, its Document Type Declaration, Elements, and Namespace Information. To track ...

Get SAX2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.