Now that we’ve seen a simple example of how models work in
R, let’s describe in detail what `lm`

does and how you can control it. A linear regression model is appropriate
when the response variable (the thing that you want to predict) can be
estimated from a linear function of the predictor variables (the
information that you know). Technically, we assume that:

where *y* is the response variable,
*x*_{1},
*x*_{2}, ...,
*x _{n}* are the predictor variables
(or predictors),

Suppose that you have a matrix of observed predictor variables
*X* and a vector of response variables
*Y*. (In this sentence, I’m using the terms “matrix”
and “vector” in the mathematical sense.) We have assumed a linear model,
so given a set of coefficients *c*, we can calculate a
set of estimates *ŷ* for the input data
*X* by calculating *ŷ* =
*cX.* The differences between the estimates
*ŷ* and the actual values *Y* are
called the *residuals*. You can think of the residuals as a measure of the prediction error; small residuals mean that the predicted values are close to the actual values. We assume that the expected difference ...

Start Free Trial

No credit card required