The **Maximum Entropy** classifier
uses a model that is very similar to the model employed by the naive
Bayes classifier. But rather than using probabilities to set the model’s parameters, it uses search
techniques to find a set of parameters that will maximize the
performance of the classifier. In particular, it looks for the set of
parameters that maximizes the **total
likelihood** of the training corpus, which is defined
as:

Example 6-18.

*P*(*features*) =
Σ_{x ∈ corpus}
*P*(*label*(*x*)|*features*(*x*))

Where *P*(*label|features*),
the probability that an input whose features are
*features* will have class label
*label*, is defined as:

Example 6-19.

*P*(*label|features*)
*= P*(*label*,
*features*)/Σ_{label}
*P*(*label*,
*features*)

Because of the potentially complex interactions between the
effects of related features, there is no way to directly calculate the
model parameters that maximize the likelihood of the training set.
Therefore, Maximum Entropy classifiers choose the model parameters using
**iterative optimization** techniques, which initialize the model’s parameters to random values, and then repeatedly refine those parameters to bring them closer to the optimal solution. These iterative optimization techniques guarantee that each refinement of the parameters will bring them closer to the optimal values, but do not necessarily provide a means of determining when those optimal values have been reached. Because the parameters for Maximum Entropy classifiers are selected using iterative optimization ...

Start Free Trial

No credit card required