Understanding overfitting

General overfitting occurs when a very complex statistical model suits the observed data because it has too many parameters compared to the number of observations. The risk is that an incorrect model can perfectly fit data, just because it is quite complex compared to the amount of data available. Although, it is possible for overfitting to occur when the amount of data is adequate. Consequently, when the model is used to predict new observations, there is a problem, because it is not able to generalize.

The concept of overfitting is also very important in regression analysis. Usually, a learning algorithm is trained using a set of examples (training set), the output of which is already known. It is assumed that ...

Get Regression Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.