The Fisher method, named for R. A. Fisher, is an alternative method that's been shown to give very accurate results, particularly for spam filtering. This is the method used by SpamBayes, an Outlook plug-in written in Python. Unlike the naïve Bayesian filter, which uses the feature probabilities to create a whole document probability, the Fisher method calculates the probability of a category for each feature in the document, then combines the probabilities and tests to see if the set of probabilities is more or less likely than a random set. This method also returns a probability for each category that can be compared to the others. Although this is a more complex method, it is worth learning because it allows much greater flexibility when choosing cutoffs for categorization.
With the naïve Bayesian filter discussed earlier, you combined all of the Pr(feature | category) results to get an overall document probability, and then flipped it around at the end. In this section, you'll begin by calculating how likely it is that a document fits into a category given that a particular feature is in that document—that is, Pr(category | feature). If the word "casino" appears in 500 documents, and 499 of those are in the bad category, "casino" will get a score very close to 1 for bad.
The normal way to calculate Pr(category | feature) would be:
|(number of documents in this category with the feature) / (total number of documents with the feature) ...|