Preprocessing data

We deal with a lot of raw data in the real world. Machine learning algorithms expect data to be formatted in a certain way before they start the training process. In order to prepare the data for ingestion by machine learning algorithms, we have to preprocess it and convert it into the right format. Let's see how to do it.

Create a new Python file and import the following packages:

import numpy as np 
from sklearn import preprocessing 

Let's define some sample data:

input_data = np.array([[5.1, -2.9, 3.3], 
                       [-1.2, 7.8, -6.1], 
                       [3.9, 0.4, 2.1], 
                       [7.3, -9.9, -4.5]]) 

We will be talking about several different preprocessing techniques. Let's start with binarization:

  • Binarization
  • Mean removal
  • Scaling
  • Normalization

Let's take a look at each technique, ...

Get Artificial Intelligence with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.