Regular Expressions and the re Module

A regular expression (RE) is a string that represents a pattern. With RE functionality, you can check any string with the pattern and see if any part of the string matches the pattern.

The re module supplies Python’s RE functionality. The compile function builds a RE object from a pattern string and optional flags. The methods of a RE object look for matches of the RE in a string or perform substitutions. Module re also exposes functions equivalent to a RE’s methods, but with the RE’s pattern string as the first argument.

REs can be difficult to master, and this book does not purport to teach them; I cover only the ways in which you can use REs in Python. For general coverage of REs, I recommend the book Mastering Regular Expressions, by Jeffrey Friedl (O’Reilly). Friedl’s book offers thorough coverage of REs at both tutorial and advanced levels. Many tutorials and references on REs can also be found online.

Pattern-String Syntax

The pattern string representing a regular expression follows a specific syntax:

  • Alphabetic and numeric characters stand for themselves. A RE whose pattern is a string of letters and digits matches the same string.

  • Many alphanumeric characters acquire special meaning in a pattern when they are preceded by a backslash (\).

  • Punctuation works the other way around: self-matching when escaped, special meaning when unescaped.

  • The backslash character is matched by a repeated backslash (i.e., the pattern \\).

Since RE patterns often ...

Get Python in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.