Chapter 7. Text Classification

In this chapter, we will cover the following recipes:

  • Bag of words feature extraction
  • Training a Naive Bayes classifier
  • Training a decision tree classifier
  • Training a maximum entropy classifier
  • Training scikit-learn classifiers
  • Measuring precision and recall of a classifier
  • Calculating high information words
  • Combining classifiers with voting
  • Classifying with multiple binary classifiers
  • Training a classifier with NLTK-Trainer

Introduction

Text classification is a way to categorize documents or pieces of text. By examining the word usage in a piece of text, classifiers can decide what class label to assign to it. A binary classifier decides between two labels, such as positive or negative. The text can either be one label or ...

Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.