Chapter 18. Naive Bayes

18.0 Introduction

Bayes’ theorem is the premier method for understanding the probability of some event, P(A | B), given some new information, P(B | A), and a prior belief in the probability of the event, P(A):

P ( A B ) = P(BA)P(A) P(B)

The Bayesian method’s popularity has skyrocketed in the last decade, more and more rivaling traditional frequentist applications in academia, government, and business. In machine learning, one application of Bayes’ theorem to classification comes in the form of the naive Bayes classifier. Naive Bayes classifiers combine a number of desirable qualities in practical machine learning into a single classifier. These include:

  1. An intuitative approach

  2. The ability to work with small data

  3. Low computation costs for training and prediction

  4. Often solid results in a variety of settings.

Specifically, a naive Bayes classifier is based on:

P ( y x 1 , , x j ) = P(x 1 ,x j y)P(y) P(x 1 ,,x j )

where:

  • P(y | x1, ⋯, xj) is called the posterior and is the probability that an observation is class y given the observation’s values for the j features, x1, ⋯, xj.

  • P(x1, ...xj | y) is called likelihood and is the likelihood of an observation’s values for features, x1, ..., xj, given their class, y.

  • P(y) is called the prior and is our belief for the probability of class y before looking at the data.

  • P(x1, ..., xj) is called the marginal probability.

In naive Bayes, we compare an observation’s posterior ...

Get Machine Learning with Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.