Bagging to improve results

Bootstrap aggregating or bagging is an algorithm introduced by Leo Breiman in 1994, which applies bootstrapping to machine learning problems. Bagging was also mentioned in the Learning with random forests recipe.

The algorithm aims to reduce the chance of overfitting with the following steps:

  1. We generate new training sets from input training data by sampling with replacement.
  2. Fit models to each generated training set.
  3. Combine the results of the models by averaging or majority voting.

The scikit-learn BaggingClassifier class allows us to bootstrap training examples, and we can also bootstrap features as in the random forests algorithm. When we perform a grid search, we refer to hyperparameters of the base estimator with the ...

Get Python Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.