O'Reilly logo

Natural Language Processing with Python by Edward Loper, Steven Bird, Ewan Klein

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Regular Expressions for Detecting Word Patterns

Many linguistic processing tasks involve pattern matching. For example, we can find words ending with ed using endswith('ed'). We saw a variety of such “word tests” in Table 1-4. Regular expressions give us a more powerful and flexible method for describing the character patterns we are interested in.

Note

There are many other published introductions to regular expressions, organized around the syntax of regular expressions and applied to searching text files. Instead of doing this again, we focus on the use of regular expressions at different stages of linguistic processing. As usual, we’ll adopt a problem-based approach and present new features only as they are needed to solve practical problems. In our discussion we will mark regular expressions using chevrons like this: «patt».

To use regular expressions in Python, we need to import the re library using: import re. We also need a list of words to search; we’ll use the Words Corpus again (Lexical Resources). We will preprocess it to remove any proper names.

>>> import re
>>> wordlist = [w for w in nltk.corpus.words.words('en') if w.islower()]

Using Basic Metacharacters

Let’s find words ending with ed using the regular expression «ed$». We will use the re.search(p, s) function to check whether the pattern p can be found somewhere inside the string s. We need to specify the characters of interest, and use the dollar sign, which has a special behavior in the context of regular expressions ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required