9

REGRESSION

In the Black Spruce Seedings Case Study in Section 1.9, the biologist was interested in how much the seedlings grew over the course of the study. Let (x1, y1), (x2, y2), . . . ,(x72, y72) denote the height and diameter change, respectively, for each of the 72 seedlings. In Figure 9.1, we see that there is a strong, positive, and linear relationship between height and diameter changes.

In this chapter, we will describe a method to model this relationship, that is, we will find a mathematical equation that best explains the linear relationship between the change in height and the change in diameter.

9.1 COVARIANCE

In Chapter 2, we introduced the scatter plot as a graphical tool to explore the relationship between two numeric variables. For example, referring to Figure 9.2a, we might describe the relationship here between the two variables as positive, linear, and moderate to moderately strong.

Now, consider the graph in Figure 9.2b. How would you describe the relationship here? This relationship would be described as linear, positive, and strong.

In fact, these two graphs are of the same two variables! The difference in the two impressions are due to the y-axis scaling. In the first graph, the range of the y-axis is roughly −2.5 to 2.5; in the second graph, the y-axis range is roughly −4.5 to 4.5. Graphs are excellent tools for exploring data, but issues such as scaling can distort our perception of underlying properties and relationships. Thus, we will consider a numeric ...

Get Mathematical Statistics with Resampling and R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.