HmmChunker uses an HMM to perform chunking over tokenized character sequences. Instances contain an HMM decoder for the model and tokenizer factory. The chunker requires the states of the HMM to conform to a token-by-token encoding of a chunking. It uses the tokenizer factory to break the chunks down into sequences of tokens and tags. Refer to the Hidden Markov Models (HMM) – part of speech recipe in Chapter 4, Tagging Words and Tokens.
We'll look at training
HmmChunker and using it for the
CoNLL2002 Spanish task. You can and should use your own data, but this recipe assumes that training data will be in the
Training is done using an
ObjectHandler which supplies the training instances.
As we want to train ...