O'Reilly logo

Natural Language Processing with Java and LingPipe Cookbook by Krishna Dayanidhi, Breck Baldwin

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

How to train and evaluate with cross validation

The earlier recipes have shown how to evaluate classifiers with truth data and how to train with truth data but how about doing both? This great idea is called cross validation, and it works as follows:

  1. Split the data into n distinct sets or folds—the standard n is 10.
  2. For i from 1 to n:
    • Train on the n - 1 folds defined by the exclusion of fold i
    • Evaluate on fold i
  3. Report the evaluation results across all folds i.

This is how most machine-learning systems are tuned for performance. The work flow is as follows:

  1. See what the cross validation performance is.
  2. Look at the error as determined by an evaluation metric.
  3. Look at the actual errors—yes, the data—for insights into how the system can be improved.
  4. Make some ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required