Models and Formulas
To statisticians, a model is a concise way to describe a set of data, usually with a mathematical formula. Sometimes, the goal is to build a predictive model with training data to predict values based on other data. Other times, the goal is to build a descriptive model that helps you understand the data better.
R has a special notation for describing relationships between
variables. Suppose that you are assuming a linear model for a variable
y
, predicted from the variables
x1
, x2
, ..., xn
. (Statisticians usually refer to y
as the dependent variable, and x1
, x2
,
..., xn
as the independent variables.) In equation
form, this implies a relationship like:
In R, you would write the relationship as y ~ x1 + x2 + ... + xn
, which is a
formula object.
So, let’s try to use a linear regression to estimate the relationship. The
formula is dist~speed
. We’ll use
the lm
function to
estimate the parameters of a linear model. The lm
function returns an object of class
lm
, which we will assign to a
variable called cars.lm
:
> cars.lm <- lm(formula=dist~speed,data=cars)
Now, let’s take a quick look at the results returned:
> cars.lm Call: lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed -17.579 3.932
As you can see, printing an lm
object shows you the original function call (and thus the data set and formula) and the estimated coefficients. For some more information, ...
Get R in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.