A lexicon, or lexical resource, is a collection of words and/or
phrases along with associated information, such as part-of-speech and
sense definitions. Lexical resources are secondary to texts, and are
usually created and enriched with the help of texts. For example, if we
have defined a text
vocab = sorted(set(my_text)) builds
the vocabulary of
FreqDist(my_text) counts the frequency of each word in the text. Both
word_freq are simple lexical resources.
Similarly, a concordance like the one we saw in Computing with Language: Texts and Words gives us
information about word usage that might help in the preparation of a
dictionary. Standard terminology for lexicons is illustrated in Figure 2-5. A lexical
entry consists of a headword (also known as a lemma) along with additional information, such
as the part-of-speech and the sense definition. Two distinct words
having the same spelling are called homonyms.
Figure 2-5. Lexicon terminology: Lexical entries for two lemmas having the same spelling (homonyms), providing part-of-speech and gloss information.
The simplest kind of lexicon is nothing more than a sorted list of words. Sophisticated lexicons include complex structure within and across the individual entries. In this section, we’ll look at some lexical resources included with NLTK.
NLTK includes ...