The *Fisher method*, named for R. A. Fisher, is
an alternative method that's been shown to give very accurate results,
particularly for spam filtering. This is the method used by
*SpamBayes*, an Outlook plug-in written in Python.
Unlike the naïve Bayesian filter, which uses the feature probabilities
to create a whole document probability, the Fisher method calculates the
probability of a category for each feature in the document, then
combines the probabilities and tests to see if the set of probabilities
is more or less likely than a random set. This method also returns a
probability for each category that can be compared to the others.
Although this is a more complex method, it is worth learning because it
allows much greater flexibility when choosing cutoffs for
categorization.

With the naïve Bayesian filter discussed earlier, you combined
all of the *Pr(feature | category)* results to get
an overall document probability, and then flipped it around at the
end. In this section, you'll begin by calculating how likely it is
that a document fits into a category given that a particular feature
is in that document—that is, *Pr(category |
feature)*. If the word "casino" appears in 500 documents,
and 499 of those are in the bad category, "casino" will get a score
very close to 1 for bad.

The normal way to calculate *Pr(category |
feature)* would be:

(number of documents in this category with the feature) / (total number of documents with the feature) ... |

Start Free Trial

No credit card required