Exploring a dataset with pandas and Matplotlib

In this first recipe, we will show how to conduct a preliminary analysis of a dataset with pandas. This is typically the first step after getting access to the data. pandas lets us load the data very easily, explore the variables, and make basic plots with Matplotlib.

We will take a look at a dataset containing all ATP matches played by the Swiss tennis player Roger Federer until 2012.

How to do it...

  1. We import NumPy, pandas, and Matplotlib:
    >>> from datetime import datetime
        import numpy as np
        import pandas as pd
        import matplotlib.pyplot as plt
        %matplotlib inline
  2. The dataset is a CSV file—that is, a text file with comma-separated values. pandas lets us load this file with a single function:
    >>> player = ...

Get IPython Interactive Computing and Visualization Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.