8.1. Making xgrep XML-Aware

The previous chapter showed that although using regular expressions to process XML directly is possible, it is not without its problems. Principal among the problems is that fully capturing the syntax of XML with regular expressions is unavoidably complex. I mentioned the fact that Python's standard library for simple XML processing—xmllib—uses regular expressions. By taking a look at xmllib, you can readily see how complex these regular expressions can get. For example, the following three regular expressions are lifted straight from xmllib.py. They illustrate what is involved in capturing the full syntactic structure of the XML declaration.

 CD-ROM reference=8001.txt _S = '[ \t\r\n]+' _opS = '[ \t\r\n]*' xmldecl ...

Get XML Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.