In statistics, modeling is where we get down to business. Models quantify the relationships between our variables. Models let us make predictions.
A simple linear regression is the most basic model. It’s just two variables and is modeled as a linear relationship with an error term:
yi = β0 + β1xi + εi
We are given the data for x and y. Our mission is to fit the model, which will give us the best estimates for β0 and β1 (Recipe 11.1).
That generalizes naturally to multiple linear regression, where we have multiple variables on the righthand side of the relationship (Recipe 11.2):
yi = β0 + β1ui + β2vi + β3wi + εi
Statisticians call u, v, and w the predictors and y the response. Obviously, the model is useful only if there is a fairly linear relationship between the predictors and the response, but that requirement is much less restrictive than you might think. Recipe 11.11 discusses transforming your variables into a (more) linear relationship so that you can use the well-developed machinery of linear regression.
The beauty of R is that anyone can build these linear models. The
models are built by a function,
lm, which returns a model object. From
the model object, we get the coefficients
regression statistics. It’s easy.
The horror of R is that anyone can build these models. Nothing requires you to check that the model is reasonable, much less statistically significant. Before you blindly believe a model, check it. Most of the ...