How to train and evaluate with cross validation
The earlier recipes have shown how to evaluate classifiers with truth data and how to train with truth data but how about doing both? This great idea is called cross validation, and it works as follows:
- Split the data into n distinct sets or folds—the standard n is 10.
- For i from 1 to n:
- Train on the n - 1 folds defined by the exclusion of fold i
- Evaluate on fold i
- Report the evaluation results across all folds i.
This is how most machine-learning systems are tuned for performance. The work flow is as follows:
- See what the cross validation performance is.
- Look at the error as determined by an evaluation metric.
- Look at the actual errors—yes, the data—for insights into how the system can be improved.
- Make some ...