Python DOM Offerings

Python has several different ways for working with the DOM. The one you choose should best fit your needs. minidom is smaller and faster than a fully compliant DOM, but suits the needs of most users. pulldom provides a way to build only the portion of the DOM needed for a particular application, allowing the DOM to be more easily used when working with large documents or tight memory constraints. 4DOM is a full-fledged DOM Level 2 implementation. While these are the dominant implementations of the DOM for Python, and the only implementations described here, realize that there are additional implementations available that may be more tailored to your requirements.

Streamlining with Minidom

minidom , part of the xml.dom package included with both the Python standard library and PyXML, is a lightweight DOM implementation. Its goal is to provide a simple implementation and smaller memory footprint than a full DOM implementation. The methods for creating the DOM are simple as well. minidom also supports functions for working with string-length XML chunks and methods for extracting them.

Overall, minidom may be best for loading simple (not necessarily small) configuration files for your applications, dealing with form submissions from web pages, handling user authorization, and using it anywhere a “little” bit of XML is needed. You can reduce memory and time overhead by using minidom. These are two elements of significant importance in web application development.

Using Pulldom

pulldom , which also may be imported from the xml.dom package, may be just the thing to save your life when faced with the task of taking a portion of a large XML document and creating a DOM instance of the subset for manipulation. pulldom essentially allows for the construction of selected portions of a DOM based on SAX events. The module uses minidom for the actual nodes it returns.

pulldom seeks to be a middle ground between the DOM and SAX. pulldom wants to overcome the state-management (the place-marking mentioned earlier) of SAX, but also preserve its stream-based processing for speed and efficiency. pulldom also seeks to simplify the self-similar, intricately complex nature of a complete DOM tree, its many nodes and lists, and its memory-gobbling nature.

4DOM: A Full Implementation

Both minidom and pulldom have their specific fits, but for the remainder of this book, we work with 4DOM. This is a DOM implementation that implements most of the Level 2 features that actually make sense outside a browser.

After your experience with SAX earlier in this chapter, interacting with the DOM may seem incredibly easy by comparison. However, dealing with a seemingly endless intricacy of stacked node classes may send you running back to SAX to do your string comparisons. However you fare, the next sections seek to introduce you to working with the DOM in Python, and to provide a reference to its interfaces.

Regardless of the implementation you use, there are two basic types of operations you can perform with the DOM. The most common operations involve retrieving information from the document, which we discuss first. Once we cover that, we move on to explain how to use the DOM to modify and create documents.

Get Python & XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.