O'Reilly logo

Programming Python, 3rd Edition by Mark Lutz

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

XML Processing Tools

Python ships with XML parsing support in its standard library and plays host to a vigorous XML special-interest group. XML (eXtended Markup Language) is a tag-based markup language for describing many kinds of structured data. Among other things, it has been adopted in roles such as a standard database and Internet content representation by many companies. As an object-oriented scripting language, Python mixes remarkably well with XML's core notion of structured document interchange.

XML is based upon a tag syntax familiar to web page writers. Python's standard library xml module package includes tools for parsing XML, in both the SAX and the DOM parsing models. In short, SAX parsers provide a subclass with methods called during the parsing operation, and DOM parsers are given access to an object tree representing the (usually) already parsed document. SAX parsers are essentially state machines and must record details as the parse progresses; DOM parsers walk object trees using loops, attributes, and methods defined by the DOM standard.

Beyond these parsing tools, Python also ships with an xmlrpclib to support the XML-RPC protocol (remote procedure calls that transmit objects encoded as XML over HTTP), as well as a standard HTML parser, htmllib, that works on similar principles and is based upon the sgmllib SGML parser module. The third-party domain has even more XML-related tools; some of these are maintained separately from Python to allow for more flexible ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required