## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# Nonlinear Relationships Between Columns: Beyond Straight Lines

While we told you the truth in Chapter 5 when we said that linear regression assumes that the relationship between two variables is a straight line, it turns out you can also use linear regression to capture relationships that aren’t well-described by a straight line. To show you what we mean, imagine that you have the data shown in panel A of Figure 6-1.

It’s obvious from looking at this scatterplot that the relationship between X and Y isn’t well-described by a straight line. Indeed, plotting the regression line shows us exactly what will go wrong if we try to use a line to capture the pattern in this data; panel B of Figure 6-1 shows the result.

We can see that we make systematic errors in our predictions if we use a straight line: at small and large values of `x`, we overpredict `y`, and we underpredict `y` for medium values of `x`. This is easiest to see in a residuals plot, as shown in panel C of Figure 6-1. In this plot, you can see all of the structure of the original data set, as none of the structure is captured by the default linear regression model.

Using `ggplot2`’s `geom_smooth` function ...

## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

No credit card required