We're going to work with some ideas similar to those we saw in the recipe on Lasso Regression. In that recipe, we looked at the number of features that had zero coefficients.
Now we're going to take this a step further and use the spareness associated with L1 norms to preprocess the features.
We'll use the diabetes dataset to fit a regression. First, we'll fit a basic
LinearRegression model with a
ShuffleSplit cross validation. After we do that, we'll use
LassoRegression to find the coefficients that are
0 when using an
L1 penalty. This hopefully will help us to avoid overfitting, which means that the model is too specific to the data it was trained on. To put this another way, the model, if overfit, does ...