Preprocessing data

We deal with a lot of raw data in the real world. Machine learning algorithms expect data to be formatted in a certain way before they start the training process. In order to prepare the data for ingestion by machine learning algorithms, we have to preprocess it and convert it into the right format. Let's see how to do it.

Create a new Python file and import the following packages:

import numpy as np 
from sklearn import preprocessing

Let's define some sample data:

input_data = np.array([[5.1, -2.9, 3.3], 
                       [-1.2, 7.8, -6.1], 
                       [3.9, 0.4, 2.1], 
                       [7.3, -9.9, -4.5]])

We will be talking about several different preprocessing techniques. Let's start with binarization:

Binarization
Mean removal
Scaling
Normalization

Let's take a look at each technique, ...

Get Artificial Intelligence with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Artificial Intelligence with Python by Prateek Joshi

Preprocessing data

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly