Processing XML

The minute my students hear the term XML, they reach for a full XML parser. But, when we’re facing a small lizard, there’s no point in reaching for a dinosaur-crushing asteroid. XML parsers are big and slow, and if they build a tree in memory, they can’t handle files bigger than the computer’s memory or infinite streams from sockets. Remember that we can view even complicated files like English text in multiple ways: as sequences of characters, words, or sentences. If we can get away with inspecting tokens instead of sentences, we’ve got a much easier problem to solve.

Let’s say we needed a list of all target tags from some XML file. grep makes short work of that without writing any code:

 
$ ​grep '<target' config.xml
 
<target ...

Get Language Implementation Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.