Dataset rebalancing

As the number of negative examples, that is, instances of fraud, is very small compared to positive examples, the learning algorithms struggle with induction. We can help them by giving them a dataset where the share of positive and negative examples is comparable. This can be achieved with dataset rebalancing.

Weka has a built-in filter, Resample, which produces a random subsample of a dataset, using sampling either with replacement or without replacement. The filter can also bias the distribution toward a uniform class distribution.

We will proceed by manually implementing k-fold cross-validation. First, we will split the dataset into k equal folds. Fold k will be used for testing, while the other folds will be used ...

Get Machine Learning in Java - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.