Lasso regularization is based on the L1-norm of the parameter vector:
Contrary to Ridge, which shrinks all the weights, Lasso can shift the smallest one to zero, creating a sparse parameter vector. The mathematical proof is beyond the scope of this book; however, it's possible to understand it intuitively by considering the following diagram (bidimensional):
The zero-centered square represents the Lasso boundaries. If we consider a generic line, the probability of being tangential to the square is higher at the ...