Cross-validation

We've seen that many times in the real world, we come across a situation where we don't have an available test data set that we can use in order to measure the performance of our model on unseen data. The most typical reason is that we have very few data overall and want to use all of it to train our model. Another situation is that we want to keep a sample of the data as a validation set to tune some model meta parameters such as cost and gamma for SVMs with radial kernels, and as a result, we've already reduced our starting data and don't want to reduce it further.

Whatever the reason for the lack of a test data set, we already know that we should never use our training data as a measure of model performance and generalization ...

Get Mastering Predictive Analytics with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.