Native Parser Interfaces

Now that we’ve looked at how SAX can be used and have seen just how regular the code is to set up the parser and the ContentHandler, you may be wondering how much of that ease comes from using SAX and how much is a matter of convenience functions in the Python libraries. While we won’t delve deeply into the native interfaces of the individual parsers, this is a good question, and can lead to some interesting observations.

The key advantage to using SAX is that the callback methods have the same names and significance regardless of the actual parser you use. There are at least two nice results of this: changing parsers does not affect your application, and your code is more maintainable because someone new to the code is more likely to know the SAX interface than any particular parser-specific interface.

So just how do the native interfaces to the individual parsers differ from SAX, and why would we choose to use them instead? Let’s take a quick look at the PyExpat parser to get a taste of the differences.

Using PyExpat Directly

Of course, to use PyExpat, you need to have it installed. It is included as part of the Python installer for Windows, and is built automatically on Unix if you have the Expat library installed. If you did not install PyExpat as part of Python, it is installed as part of the PyXML package.

PyExpat resides in the xml.parsers.expat module. If we want to modify our last example to use PyExpat directly, we don’t have a lot of work to do, ...

Get Python & XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.