Chapter 5. Data Analysis with pandas

Data! Data! Data! I can’t make bricks without clay!

Sherlock Holmes

This chapter is about pandas, a library for data analysis with a focus on tabular data. pandas is a powerful tool that not only provides many useful classes and functions but also does a great job of wrapping functionality from other packages. The result is a user interface that makes data analysis, and in particular financial analysis, a convenient and efficient task.

This chapter covers the following fundamental data structures:

Object type Meaning Used for

DataFrame

2-dimensional data object with index

Tabular data organized in columns

Series

1-dimensional data object with index

Single (time) series of data

The chapter is organized as follows:

“The DataFrame Class”

This section starts by exploring the basic characteristics and capabilities of the DataFrame class of pandas by using simple and small data sets; it then shows how to transform a NumPy ndarray object into a DataFrame object.

“Basic Analytics” and “Basic Visualization”

Basic analytics and visualization capabilities are introduced in these sections (later chapters go deeper into these topics).

“The Series Class”

This rather brief section covers the Series class of pandas, which in a sense represents a special case of the DataFrame class with a single column of data only.

“GroupBy Operations”

One of the strengths of the DataFrame class lies in grouping data according to a single or multiple ...

Get Python for Finance, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.