Chapter 4. Parsers and State Machines

All the techniques presented in the prior chapters of this book have something in common, but something that is easy to overlook. In a sense, every basic string and regular expression operation treats strings as homogeneous. Put another way: String and regex techniques operate on flat texts. While said techniques are largely in keeping with the “Zen of Python” maxim that “Flat is better than nested,” sometimes the maxim (and homogeneous operations) cannot solve a problem. Sometimes the data in a text has a deeper structure than the linear sequence of bytes that make up strings.

It is not entirely true that the prior chapters have eschewed data structures. From time to time, the examples presented ...

Get Text Processing in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.