XML Processing Tools
Python ships with XML parsing support in its standard library and plays host to a vigorous XML special-interest group. XML (eXtended Markup Language) is a tag-based markup language for describing many kinds of structured data. Among other things, it has been adopted in roles such as a standard database and Internet content representation by many companies. As an object-oriented scripting language, Python mixes remarkably well with XML’s core notion of structured document interchange, and promises to be a major player in the XML arena.
XML is based
upon a tag syntax familiar to web page writers. Python’s
xmllib
library module includes tools for parsing
XML. In short, this XML parser is used by defining a subclass of an
XMLParser
Python class, with methods that serve as
callbacks to be invoked as various XML structures are detected. Text
analysis is largely automated by the library module. This
module’s source code, file xmllib.py
in
the Python library, includes self-test code near the bottom that
gives additional usage details. Python also ships with a standard
HTML parser, htmllib
, that works on similar
principles and is based upon the sgmllib
SGML
parser module.
Unfortunately, Python’s XML support is still evolving, and describing it is well beyond the scope of this book. Rather than going into further details here, I will instead point you to sources for more information:
- Standard library
First off, be sure to consult the Python library manual for more ...
Get Programming Python, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.