10Multiple Regression

In multiple regression we have a continuous response variable and two or more continuous explanatory variables (i.e. there are no categorical explanatory variables). In many applications, multiple regression is the most difficult of all the statistical models to do well. There are several things that make multiple regression so challenging:

  • the studies are often observational (rather than controlled experiments)
  • we often have a great many explanatory variables
  • we often have rather few data points
  • missing combinations of explanatory variables are commonplace

There are several important statistical issues, too:

  • the explanatory variables are often correlated with one another (non-orthogonal)
  • there are major issues about which explanatory variables to include
  • there could be curvature in the response to the explanatory variables
  • there might be interactions between explanatory variables
  • the last three issues all tend to lead to parameter proliferation

There is a temptation to become personally attached to a particular model. Statisticians call this ‘falling in love with your model’. It is as well to remember the following truths about models:

  • all models are wrong
  • some models are better than others
  • the correct model can never be known with certainty
  • the simpler the model, the better it is

Fitting models to data is the central function of R. The process is essentially one of exploration; there are no fixed rules and no absolutes. The object is to determine ...

Get Statistics: An Introduction Using R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.