At the core of linear regression, there is the search for a line's equation that it is able to minimize the sum of the squared errors of the difference between the line's y values and the original ones. As a reminder, let's say our regression function is called
h, and its predictions
h(X), as in this formulation:
Consequently, our cost function to be minimized is as follows:
There are quite a few methods to minimize it, some performing better than others in the presence of large quantities of data. Among the better performers, ...