Gradient Descent

To perform OLS regression using the GD algorithm, we have to minimize the cost function that measures how good a given regression line is.

This cost function will take in an (α , β) pair and return an error value based on how well the line fits our data. To compute this error for a given line, we'll iterate through each (x,y) point in our dataset and sum the square distances between each point's y value and the candidate line's y value (computed at α x + β).

The cost function looks like the following:

If we plot the loss function (z axis) with respect to the coefficients of linear regression (α and β), the 3D plot looks like ...

Get Regression Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.