Chapter 8Simple Linear Regression

Regression modeling represents a powerful and elegant method for estimating the value of a continuous target variable. In this chapter, we introduce regression modeling through simple linear regression, where a straight line is used to approximate the relationship between a single continuous predictor variable and a single continuous response variable. Later, in Chapter 9, we turn to multiple regression, where several predictor variables are used to estimate a single response.

8.1 An Example of Simple Linear Regression

To develop the simple linear regression model, consider the Cereals data set,1 an excerpt of which is presented in Table 8.1. The Cereals data set contains nutritional information for 77 breakfast cereals, and includes the following variables:

  • Cereal name
  • Cereal manufacturer
  • Type (hot or cold)
  • Calories per serving
  • Grams of protein
  • Grams of fat
  • Milligrams of sodium
  • Grams of fiber
  • Grams of carbohydrates
  • Grams of sugar
  • Milligrams of potassium
  • Percentage of recommended daily allowance of vitamins (0%, 25%, or 100%)
  • Weight of one serving
  • Number of cups per serving
  • Shelf location (1 = bottom, 2 = middle, 3 = top)
  • Nutritional rating, as calculated by Consumer Reports.

Table 8.1 Excerpt from Cereals data set: eight fields, first 16 cereals

Cereal Name Manufacture Sugars Calories Protein Fat Sodium Rating
100% Bran N 6 70 4 1 130 68.4030
100% Natural Bran Q 8 120 3 5 15 33.9837
All-Bran K 5 70 4 1 260 59.4255
All-Bran ...

Get Data Mining and Predictive Analytics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.