MEASURES OF PREDICTIVE SUCCESS

Whatever method of validation is used, we need to have some measure of the success of the prediction procedure. One possibility is to use the sum of the losses in the calibration and the validation sample. Even this procedure contains an ambiguity that we need resolve. Are we more concerned with minimizing the expected loss, the average loss, or the maximum loss?

One measure of goodness of fit of the model is , where y_i and denote the ith observed value and the corresponding value obtained from the model. The smaller this sum of squares, the better the fit.

If the observations are independent, then

The first sum on the right hand side of the equation is the total sum of squares (SST). Most statistics software use as a measure of fit R² = 1 − SSE/SST. The closer the value of R² is to 1, the better.

The automated entry of predictors into the regression equation using R² runs the risk of overfitting, as R² is guaranteed to increase with each predictor entering the model. To compensate, one may use the adjusted R²:

where n is the number of observations used ...

Get Common Errors in Statistics (and How to Avoid Them), 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Common Errors in Statistics (and How to Avoid Them), 4th Edition by Phillip I. Good, James W. Hardin

MEASURES OF PREDICTIVE SUCCESS

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly