Chapter 11. Linear Regression and ANOVA

Introduction

In statistics, modeling is where we get down to business. Models quantify the relationships between our variables. Models let us make predictions.

A simple linear regression is the most basic model. It’s just two variables and is modeled as a linear relationship with an error term:

y_i = β₀ + β₁x_i + ε_i

We are given the data for x and y. Our mission is to fit the model, which will give us the best estimates for β₀ and β₁ (Recipe 11.1).

That generalizes naturally to multiple linear regression, where we have multiple variables on the righthand side of the relationship (Recipe 11.2):

y_i = β₀ + β₁u_i + β₂v_i + β₃w_i + ε_i

Statisticians call u, v, and w the predictors and y the response. Obviously, the model is useful only if there is a fairly linear relationship between the predictors and the response, but that requirement is much less restrictive than you might think. Recipe 11.11 discusses transforming your variables into a (more) linear relationship so that you can use the well-developed machinery of linear regression.

The beauty of R is that anyone can build these linear models. The models are built by a function, lm, which returns a model object. From the model object, we get the coefficients (β_i) and regression statistics. It’s easy.

The horror of R is that anyone can build these models. Nothing requires you to check that the model is reasonable, much less statistically significant. Before you blindly believe a model, check it. Most of the ...

Get R Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

R Cookbook by Paul Teetor

Chapter 11. Linear Regression and ANOVA

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly