O'Reilly logo

scikit-learn Cookbook by Trent Hauck

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Feature selection on L1 norms

We're going to work with some ideas similar to those we saw in the recipe on Lasso Regression. In that recipe, we looked at the number of features that had zero coefficients.

Now we're going to take this a step further and use the spareness associated with L1 norms to preprocess the features.

Getting ready

We'll use the diabetes dataset to fit a regression. First, we'll fit a basic LinearRegression model with a ShuffleSplit cross validation. After we do that, we'll use LassoRegression to find the coefficients that are 0 when using an L1 penalty. This hopefully will help us to avoid overfitting, which means that the model is too specific to the data it was trained on. To put this another way, the model, if overfit, does ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required