In statistics, modeling is where we get down to business. Models quantify the relationships between our variables. Models let us make predictions.

A *simple linear regression* is the most
basic model. It’s just two variables and is modeled as a linear
relationship with an error term:

y_{i}=β_{0}+β_{1}x_{i}+ε_{i}

We are given the data for *x* and
*y*. Our mission is to *fit the model*, which will give us the
best estimates for *β*_{0} and
*β*_{1} (Recipe 11.1).

That generalizes naturally to multiple linear regression, where we have multiple variables on the righthand side of the relationship (Recipe 11.2):

y_{i}=β_{0}+β_{1}u_{i}+β_{2}v_{i}+β_{3}w_{i}+ε_{i}

Statisticians call *u*, *v*,
and *w* the *predictors* and *y*
the *response*. Obviously, the model is
useful only if there is a fairly linear relationship between the
predictors and the response, but that requirement is much less restrictive
than you might think. Recipe 11.11 discusses
transforming your variables into a (more) linear relationship so that you
can use the well-developed machinery of linear regression.

The beauty of R is that anyone can build these linear models. The
models are built by a function, `lm`

, which returns a model object. From
the model object, we get the coefficients
(*β*_{i}) and
regression statistics. It’s easy.

The horror of R is that anyone can build these models. Nothing requires you to check that the model is reasonable, much less statistically significant. Before you blindly believe a model, check it. Most of the ...

Start Free Trial

No credit card required