O'Reilly logo

Python Machine Learning Cookbook by Prateek Joshi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Tackling class imbalance

Until now, we dealt with problems where we had a similar number of datapoints in all our classes. In the real world, we might not be able to get data in such an orderly fashion. Sometimes, the number of datapoints in one class is a lot more than the number of datapoints in other classes. If this happens, then the classifier tends to get biased. The boundary won't reflect of the true nature of your data just because there is a big difference in the number of datapoints between the two classes. Therefore, it becomes important to account for this discrepancy and neutralize it so that our classifier remains impartial.

How to do it…

  1. Let's load the data:
    input_file = 'data_multivar_imbalance.txt' X, y = utilities.load_data(input_file) ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required