Residuals

The errors (or mistakes) that our model makes are called error terms or residuals, and are denoted in our univariate linear regression equation by εi. Our goal therefore is to choose regression coefficients for the independent variables (in our case β1) that minimize these residuals. To compute the ith residual, we can simply subtract the predicted value from the actual value, as illustrated in Figure 4.1. To quantify the quality of our regression line, and hence our regression model, we can use a metric called the Sum of Squared Errors (SSE), which is simply the sum of all squared residuals, as follows:

A smaller SSE implies a better ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.