Relating Document Structure to Nodes

Although the DOM doesn’t specify an interface to cause a document to be parsed, it does specify how the document’s syntax structures are encoded as DOM objects. A document is stored as a hierarchical tree structure, with each item in the tree linked to its parent, children, and siblings:

<sample bogus="value"><text_node>Test data.</text_node></sample>

Figure 19-1 shows how the preceding short sample document would be stored by a DOM parser.

Document storage and linkages
Figure 19-1. Document storage and linkages

Each Node-derived object in a parsed DOM document contains references to its parent, child, and sibling nodes. These references make it possible for applications to enumerate document data using any number of standard tree-traversal algorithms. “Walking the tree” is a common approach to finding information stored in a DOM and is demonstrated in Example 19-1 at the end of this chapter.

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.