Chapter 4.  Text Classification

This chapter builds on a brief introduction to text classification and provides you with an example of the Naïve Bayes algorithm, developed from scratch in order to explain how to turn an equation into code.

In this chapter, we will cover:

  • Learning and classification
  • Bayesian classification
  • Naïve Bayes algorithm
  • E-mail subject line tester
  • The data
  • The algorithm
  • Classifier accuracy

Learning and classification

When we want to automatically identify which category belongs to a specific value (categorical value), we need to implement an algorithm that can decide the most likely category for the value based on previous data. This is called a classifier. In the words of Tom Mitchell:

"How can we build computer systems that automatically ...

Get Practical Data Analysis - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.