Mapping Words to Properties Using Python Dictionaries

As we have seen, a tagged word of the form (word, tag) is an association between a word and a part-of-speech tag. Once we start doing part-of-speech tagging, we will be creating programs that assign a tag to a word, the tag which is most likely in a given context. We can think of this process as mapping from words to tags. The most natural way to store mappings in Python uses the so-called dictionary data type (also known as an associative array or hash array in other programming languages). In this section, we look at dictionaries and see how they can represent a variety of language information, including parts-of-speech.

Indexing Lists Versus Dictionaries

A text, as we have seen, is treated in Python as a list of words. An important property of lists is that we can “look up” a particular item by giving its index, e.g., text1[100]. Notice how we specify a number and get back a word. We can think of a list as a simple kind of table, as shown in Figure 5-2.

List lookup: We access the contents of a Python list with the help of an integer index.

Figure 5-2. List lookup: We access the contents of a Python list with the help of an integer index.

Contrast this situation with frequency distributions (Computing with Language: Simple Statistics), where we specify a word and get back a number, e.g., fdist['monstrous'], which tells us the number of times a given word has occurred in a text. Lookup using words is familiar to anyone ...

Get Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.