7.13. Cautionary Tales

The examples I have used in this chapter combined with the fragment from the xmllib standard library strongly hint that XML can be gainfully processed with Python's regular-expression support. This is indeed the case, but a number of caveats need to be borne in mind.

Some things you can incur in an XML document can cause simple regular expression-based programs to go astray. Some of the more common “gotchas” are:

  • Comments

  • CDATA sections

  • General entity references

  • Single-quoted attribute value literals

  • Document-type declaration subsets

If you do not use these features of XML (and feel confident that the XML you will have to process will never contain any of them), feel free to skip this section.

7.13.1. Comments

An XML document ...

Get XML Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.