Modeling suspicious patterns

To design a classifier, we can follow the standard supervised learning steps, as described in Chapter 1, Applied Machine Learning Quick Start. In this recipe, we will include some additional steps to handle unbalanced datasets and evaluate classifiers based on precision and recall. The plan is as follows:

  1. Load the data in the .csv format.
  2. Assign the class attribute.
  3. Convert all of the attributes from a numeric to nominal value to make sure that there are no incorrectly loaded numerical values.
  4. Experiment 1: Evaluating the models with k-fold cross-validation.
  5. Experiment 2: Rebalancing the dataset to a more balanced class distribution, and manually perform cross-validation.
  6. Compare the classifiers by recall, precision, ...

Get Machine Learning in Java - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.