Extracting location chunks
To identify LOCATION
chunks, we can make a different kind of ChunkParserI
subclass that uses the gazetteers
corpus to identify location words. The gazetteers
corpus is a WordListCorpusReader
class that contains the following location words:
- Country names
- U.S. states and abbreviations
- Major U.S. cities
- Canadian provinces
- Mexican states
How to do it...
The
LocationChunker
class, found in chunkers.py
, iterates over a tagged sentence looking for words that are found in the gazetteers
corpus. When it finds one or more location words, it creates a LOCATION
chunk using IOB tags. The helper method iob_locations()
is where the IOB LOCATION
tags are produced, and the parse()
method converts these IOB tags into a Tree
:
from nltk.chunk ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.