Programming Interfaces for XML: DOM, SAX, and Others

The two most popular APIs used to parse XML documents are the Document Object Model (DOM) and the Simple API for XML (SAX). DOM is an official recommendation of the W3C (available at http://www.w3.org/TR/REC-DOM-Level-1), while SAX is a de facto standard created by David Megginson and others on the XML-DEV mailing list (http://lists.xml.org/archives). We’ll discuss these two APIs briefly here. We won’t use them much in this book, but learning more about them will give you some insight into how most XSLT processors work.

Tip

See http://www.saxproject.org/ for the SAX standard. If you’d like to learn more about the XML-DEV mailing list, send email to . You can also check out http://lists.xml.org/archives/xml-dev/ to see the XML-DEV mailing list archives.

DOM

DOM is designed to build a tree view of your document. Remember that all XML documents must be contained in a single element. That single element then becomes the root of the tree. The DOM specification defines several language-neutral interfaces, described here:

Node

This interface is the base datatype of the DOM. Document, Element, Attr, Text, Comment, and ProcessingInstruction all extend the Node interface.

Document

This object contains the DOM representation of the XML document. Given a Document object, you can get the root of the tree (the Document element); from the root, you can move through the tree to find all elements, attributes, text, comments, ...

Get XSLT, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.