XML Processing Tools

Python ships with XML parsing support in its standard library and plays host to a vigorous XML special-interest group. XML (eXtended Markup Language) is a tag-based markup language for describing many kinds of structured data. Among other things, it has been adopted in roles such as a standard database and Internet content representation by many companies. As an object-oriented scripting language, Python mixes remarkably well with XML’s core notion of structured document interchange, and promises to be a major player in the XML arena.

XML is based upon a tag syntax familiar to web page writers. Python’s xmllib library module includes tools for parsing XML. In short, this XML parser is used by defining a subclass of an XMLParser Python class, with methods that serve as callbacks to be invoked as various XML structures are detected. Text analysis is largely automated by the library module. This module’s source code, file xmllib.py in the Python library, includes self-test code near the bottom that gives additional usage details. Python also ships with a standard HTML parser, htmllib, that works on similar principles and is based upon the sgmllib SGML parser module.

Unfortunately, Python’s XML support is still evolving, and describing it is well beyond the scope of this book. Rather than going into further details here, I will instead point you to sources for more information:

Standard library

First off, be sure to consult the Python library manual for more ...

Get Programming Python, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.