As we have seen, a tagged word of the form
(word, tag) is an association between a word and a part-of-speech
tag. Once we start doing part-of-speech tagging, we will be creating
programs that assign a tag to a word, the tag which is most likely in a
given context. We can think of this process as mapping from words to tags. The most natural
way to store mappings in Python uses the so-called dictionary data type (also known as an
associative array or hash array in other programming languages). In
this section, we look at dictionaries and see how they can represent a
variety of language information, including parts-of-speech.
A text, as we have seen, is treated in Python as a list of
words. An important property of lists is that we can “look up” a
particular item by giving its index, e.g.,
text1. Notice how we specify a number
and get back a word. We can think of a list as a simple kind of table,
as shown in Figure 5-2.
Figure 5-2. List lookup: We access the contents of a Python list with the help of an integer index.
Contrast this situation with frequency distributions (Computing with Language: Simple Statistics), where we
specify a word and get back a number, e.g.,
fdist['monstrous'], which tells us the number of times a given word has occurred in a text. Lookup using words is familiar to anyone ...