Models and Formulas
To statisticians, a model is a concise way to describe a set of data, usually with a mathematical formula. Sometimes, the goal is to build a predictive model with training data to predict values based on other data. Other times, the goal is to build a descriptive model that helps you understand the data better.
R has a special notation for describing relationships between variables. Suppose that you are assuming a linear
model for a variable y
, predicted from
the variables x1
, x2
, ..., xn
.
(Statisticians usually refer to y
as
the dependent variable, and x1
, x2
, ...,
xn
as the
independent variables.) In equation form, this
implies a relationship like:
In R, you would write the relationship as y
~ x1 + x2 + ... + xn
, which is a formula object.
As an example, let’s use the cars
data set (which is included in the base
package). This data set was created during the 1920s and shows the speed
and stopping distance for a set of different cars. We’ll look at the
relationship between speed and stopping distance. We’ll assume that the
stopping distance is a linear function of speed. So let’s try to use a
linear regression to estimate the relationship. The formula is dist~speed
. We’ll use the lm
function to estimate the parameters of a
linear model. The lm
function returns
an object of class lm
, which we will
assign to a variable called cars.lm
:
> cars.lm <- lm(formula=dist~speed,data=cars) ...
Get R in a Nutshell, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.