O'Reilly logo

Mastering pandas by Femi Anthony

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data analysis and preprocessing using pandas

In this section, we will utilize pandas to do some analysis and preprocessing of the data before submitting it as input to scikit-learn.

Examining the data

In order to start our preprocessing of the data, let us read in the training dataset and examine what it looks like.

Here, we read in the training dataset into a pandas DataFrame and display the first rows:

In [2]: import pandas as pd
        import numpy as np
# For .read_csv, always use header=0 when you know row 0 is the header row
        train_df = pd.read_csv('csv/train.csv', header=0)
In [3]: train_df.head(3)

The output is as follows:

Examining the data

Thus, we can see the various ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required