Event-Driven XML Processing

As an XML parser reads a document, it moves from the beginning of the document to the end. It may pause to retrieve external resources—for a DTD or an external entity, for instance—but it builds an understanding of the document as it moves along. Tree-based XML technologies (such as the DOM) combine these incremental parsing events into a monolithic image of an XML document once parsing has been completed successfully.

Event-based parsers, on the other hand, report these interim events to their client applications as they happen. Some common parsing events are element start-tag read, element content read, and element end-tag read. For example, consider the document in Example 18-1.

Example 18-1. Simple XML document
<name><given>Keith</given><family>Johnson</family></name>

An event-based parser might report events such as this:

startElement:name
startElement:given
content: Keith
endElement:given
startElement:family
content:Johnson
endElement:family
endElement:name

The list and structure of events can become much more complex as features such as namespaces, attributes, whitespace between elements, comments, processing instructions, and entities are added, but the basic mechanism is quite simple and generally very efficient.

Event-based applications are generally more complex than tree-based applications. Processing events typically means the creation of a state machine, code that understands the current context and can route the information in the events to the ...

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.