O'Reilly logo

Python Text Processing with NLTK 2.0 Cookbook by Jacob Perkins

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Training a decision tree classifier

The DecisionTreeClassifier works by creating a tree structure, where each node corresponds to a feature name, and the branches correspond to the feature values. Tracing down the branches, you get to the leaves of the tree, which are the classification labels.

Getting ready

For the DecisionTreeClassifier to work for text classification, you must use NLTK 2.0b9 or later. This is because earlier versions are unable to deal with unknown features. If the DecisionTreeClassifier encountered a word/feature that it hadn't seen before, then it raised an exception. This bug has now been fixed by yours truly, and is included in all NLTK versions since 2.0b9.

How to do it...

Using the same train_feats and test_feats we created ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required