Models and Formulas

To statisticians, a model is a concise way to describe a set of data, usually with a mathematical formula. Sometimes, the goal is to build a predictive model with training data to predict values based on other data. Other times, the goal is to build a descriptive model that helps you understand the data better.

R has a special notation for describing relationships between variables. Suppose that you are assuming a linear model for a variable y, predicted from the variables x1, x2, ..., xn. (Statisticians usually refer to y as the dependent variable, and x1, x2, ..., xn as the independent variables.) In equation form, this implies a relationship like:

Models and Formulas

In R, you would write the relationship as y ~ x1 + x2 + ... + xn, which is a formula object.

So, let’s try to use a linear regression to estimate the relationship. The formula is dist~speed. We’ll use the lm function to estimate the parameters of a linear model. The lm function returns an object of class lm, which we will assign to a variable called cars.lm:

> cars.lm <- lm(formula=dist~speed,data=cars)

Now, let’s take a quick look at the results returned:

> cars.lm

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932

As you can see, printing an lm object shows you the original function call (and thus the data set and formula) and the estimated coefficients. For some more information, ...

Get R in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.