Class unbalance is a problem we come across in Chapter 7, Fraud and Anomaly Detection, where the goal was to detect fraudulent insurance claims. The challenge is that a very large part of the dataset, usually more than 90%, describes normal activities, and only a small fraction of the dataset contains fraudulent examples. In such a case, if the model always predicts normal, then it is correct 90% of the time. This problem is extremely common in practice and can be observed in various applications, including fraud detection, anomaly detection, medical diagnosis, oil spillage detection, and facial recognition.
Now, knowing what the class unbalance problem is and why it is a problem, let's take a look at how to deal with this ...