16.4. Handling Large XML Files with xTract

The above implementation of xTract has the virtue that it is small and easy to understand. Unfortunately, it does not scale well. That is, if you were to use it to search a 30-Mbyte XML file, you would be in for long wait. The crux of the problem is that xTract as coded above is memory bound. It reads the entire XML file to be searched into memory.

It would be possible to rewrite xTract to use a complete event- driven approach. However, it is not an inviting prospect. Finding, say, title elements whose contents match the regular expression "P*.?n" can only be done once the entire content of the title element has been seen, that is, until after its end-tag event has been dispatched. A default start of ...

Get XML Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.