7Regression

Regression analysis is the statistical method you use when both the response variable and the explanatory variable are continuous variables (i.e. real numbers with decimal places – things like heights, weights, volumes, or temperatures). Perhaps the easiest way of knowing when regression is the appropriate analysis is to see that a scatterplot is the appropriate graphic (in contrast to analysis of variance, say, when the appropriate plot would have been a box-and-whisker or a bar chart).

The essence of regression analysis is using sample data to estimate parameter values and their standard errors. First, however, we need to select a model which describes the relationship between the response variable and the explanatory variable(s). There are literally hundreds of models from which we might choose. Perhaps the most important thing to learn about regression is that model choice is a really big deal. The simplest model of all is the linear model:

equation

The response variable is y, and x is a continuous explanatory variable. There are two parameters, a and b: the intercept is a (the value of y when x = 0); and the slope is b (the slope, or gradient, is the change in y divided by the change in x which brought it about). The slope is so important that it is worth drawing a picture to make clear what is involved.

The task is to work out the slope and intercept of this negative ...

Get Statistics: An Introduction Using R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.