Model Simplification

A multiple regression is a statistical model with two or more continuous explanatory variables. We contrast this kind of model with analysis of variance, where all the explanatory variables are categorical (Chapter 11) and analysis of covariance, where the explanatory variables are a mixture of continuous and categorical (Chapter 12).

The principle of parsimony (Occam's razor), discussed in the previous chapter on p. 325, is again relevant here. It requires that the model should be as simple as possible. This means that the model should not contain any redundant parameters or factor levels. We achieve this by fitting a maximal model and then simplifying it by following one or more of these steps:

  • Remove non-significant interaction terms.
  • Remove non-significant quadratic or other non-linear terms.
  • Remove non-significant explanatory variables.
  • Group together factor levels that do not differ from one another.
  • Amalgamate explanatory variables that have similar parameter values.
  • Set non-significant slopes to zero within ANCOVA.

Of course, such simplifications must make good scientific sense, and must not lead to significant reductions in explanatory power. It is likely that many of the explanatory variables are correlated with each other, and so the order in which variables are deleted from the model will influence the explanatory power attributed to them. The thing to remember about multiple regression is that, in principle, there is no end to it. The number ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.