26.9. Regular Expressions

For text analysis, Python provides the regex and the re modules. The regex module is old and somewhat deprecated, although still available. The regex module uses an emacs-style format, which some users find difficult to read. Using regular expressions from the re module, you can construct advanced pattern-matching algorithms in a less arcane syntax. Regular expressions are handled via a small, highly specialized programming language embedded in Python and are made available through the re module. Using the re module, you specify the rules for the set of possible strings that you want to match. You can use it to determine whether the string matches the pattern or whether there is a match for the pattern anywhere in the string. You can also use the re module to modify a string or to split it apart in various ways. The following sections cover basic Python regular expressions.

26.9.1. Regular Expression Operations

In Python, string methods are typically used for searching, replacing, and parsing. Regular expressions are used for matching and are delimited with forward slashes. Regular expressions are compiled into RegexObject instances, which have methods for various operations such as searching for pattern matches or performing string substitutions.

>>> import re
>>> reobj = re.compile('foo*')
>>> print reobj
<_sre.SRE_Pattern object at 0x403c38c0>

26.9.2. Regex Special Characters

The following table describes some of the more popular special characters ...

Get Web Standards Programmer's Reference: HTML, CSS, JavaScript®, Perl, Python®, and PHP now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.