O'Reilly logo

Programming Python, Second Edition by Mark Lutz

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Regular Expression Matching

Splitting and joining strings is a simple way to process text, as long as it follows the format you expect. For more general text analysis tasks, Python provides regular expression matching utilities. Regular expressions (REs) are simply strings that define patterns to be matched against other strings. You supply a pattern and a string, and ask if the string matches your pattern. After a match, parts of the string matched by parts of the pattern are made available to your script. That is, matches not only give a yes/no answer, but they can pick out substrings as well.

Regular expression pattern strings can be complicated (let’s be honest -- they can be downright gross to look at). But once you get the hang of them, they can replace larger hand-coded string search routines. In Python, regular expressions are not part of the syntax of the Python language itself, but are supported by extension modules that you must import to use. The modules define functions for compiling pattern strings into pattern objects, matching these objects against strings, and fetching matched substrings after a match.

Beyond those generalities, Python’s regular expression story is complicated a little by history:

The regex module (old)

In earlier Python releases, a module called regex was the standard (and only) RE module. It was fast and supported patterns coded in awg, grep, and emacs style, but it is now somewhat deprecated (though it will likely still be available for some ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required