Regularization on linear models

The Stochastic Gradient Descent algorithm (SGD) finds the optimal weights {wi} of the model by minimizing the error between the true and the predicted values on the N training samples:

Where  are the predicted values, ŷi the real values to be predicted; we have N samples, and each sample has n dimensions.

Regularization consists of adding a term to the previous equation and to minimize the regularized error:

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.