Cross-validation

If you have run the previous experiment, you may have realized that:

  • Both the validation and test results vary, as their samples are different
  • The chosen hypothesis is often the best one, but this is not always the case

Unfortunately, relying on the validation and testing phases of samples brings uncertainty along with a reduction of the learning examples dedicated to training (the fewer the examples, the more the variance of the estimates from the model).

A solution would be to use cross-validation, and Scikit-learn offers a complete module for cross-validation and performance evaluation (sklearn.cross_validation).

By resorting to cross-validation, you'll just need to separate your data into a training and test set, and you will be ...

Get Python Data Science Essentials - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.