Part-of-speech tagging is the process of identifying the part-of-speech tag for a word. Most of the time, a tagger must first be trained on a training corpus. How to train and use a tagger is covered in detail in Chapter 4, Part-of-speech Tagging, but first we must know how to create and use a training corpus of part-of-speech tagged words.
The simplest format for a tagged corpus is of the form word/tag. The following is an excerpt from the
The/at-tl expense/nn and/cc time/nn involved/vbn are/ber astronomical/jj ./.
Each word has a tag denoting its part-of-speech. For example,
nn refers to a noun, while a tag that starts with
vb is a verb.
Different corpora can use different ...