Naïve Bayes is a really interesting model. It's somewhat similar to k-NN in the sense that it makes some assumptions that might oversimplify reality, but still perform well in many cases.
In this recipe, we'll use Naïve Bayes to do document classification with sklearn. An example I have personal experience of is using the words that make up an account descriptor in accounting, such as Accounts Payable, and determining if it belongs to Income Statement, Cash Flow Statement, or Balance Sheet.
The basic idea is to use the word frequency from a labeled test corpus to learn the classifications of the documents. Then, we can turn this on a training set and attempt to predict the label.
We'll use the ...