Expanding and removing chunks with regular expressions

There are three RegexpChunkRule subclasses that are not supported by RegexpChunkRule.fromstring() or RegexpParser, and therefore must be created manually if you want to use them. These rules are as follows:

  • ExpandLeftRule: Add unchunked (chink) words to the left of a chunk
  • ExpandRightRule: Add unchunked (chink) words to the right of a chunk
  • UnChunkRule: Unchunk any matching chunk

How to do it...

ExpandLeftRule and ExpandRightRule both take two patterns along with a description as arguments. For ExpandLeftRule, the first pattern is the chink we want to add to the beginning of the chunk, while the right pattern will match the beginning of the chunk we want to expand. With ExpandRightRule, the left ...

Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.