Generalization error and overfitting

So, how do we know that the model we have discussed is good? One obvious and ultimate criterion is its performance in practice.

One common problem that plagues the more complex models, such as decision trees and neural nets, is overfitting. The model can minimize the desired metric on the provided data, but does a very poor job on a slightly different dataset in practical deployments, Even a standard technique, when we split the dataset into training and test, the training for deriving the model and test for validating that the model works well on a hold-out data, may not capture all the changes that are in the deployments. For example, linear models such as ANOVA, logistic, and linear regression are usually ...

Get Mastering Scala Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.