7.12. Parsing XML with Regular Expressions

The XML specification document contains a rigorous definition of the structure of an XML document in the form of a grammar. This grammar lends itself to implementation in terms of (admittedly) complex regular expressions.

The xmllib library, which is part of the standard Python distribution, makes extensive use of regular expressions. It is well worth having a look at xmllib to get an idea of how serious regular-expression programming is done.

One particularly important point that xmllib illustrates well is that legibility of complex regular expressions is greatly improved if they are built from small pieces.

The following two lines of code are lifted straight from xmllib. Note how the entityref regular ...

Get XML Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.