O'Reilly logo
live online training icon Live Online training

Getting started with pandas

Powered by Jupyter logo

Data ingestion, tweaking, and summarizing

Matt Harrison

The pandas library allows you to perform data ingestion, exporting, transformation, and visualization with ease. As a result, it's very popular among data scientists, quants, Excel junkies, and Python developers.

Matt Harrison leads a deep dive into some advanced features of pandas, such as plotting, the integration with matplotlib, and filtering data. Using the Jupyter Notebook, you'll load data, inspect it, tweak it, visualize it, and do some analysis with only a few lines of code. By the end of this three-hour hands-on training, you’ll be able to use the split-apply-combine paradigm with GroupBy and pivot and be familiar with stacking and unstacking data.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • How pandas can make life easier for data scientists and programmers
  • How to use Jupyter to interact with Python scripts

And you’ll be able to:

  • Import, explore, and tweak data with pandas
  • Understand how to get help when you get stuck
  • Practice debugging doing analytics with pandas

This training course is for you because...

  • You're a data scientist with experience in R or SAS who wants to learn about pandas and the Python ecosystem.
  • You're a developer with programming experience in Python who wants to start using pan

Prerequisites

  • Programming experience with Python
  • A prior introduction to the basics of pandas will be helpful but is not necessary. (See “Recommended preparation” below.)

Recommended preparation:

Recommended follow-up:

About your instructor

  • Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Set up and introduction to Jupyter (15 minutes)

  • Lecture: Jupyter features

Introduction to pandas (10 minutes)

  • Lecture: pandas basic data structures

Loading data (25 minutes)

  • Lecture: Ingesting data from the web and CSV files; exploring some of the options for manipulation during loading
  • Hands-on exercise: Load data

Break (10 minutes)

Inspecting data (30 minutes)

  • Lecture: Examining your data, characterizing it, and seeing what it looks like
  • Hands-on exercise: Inspect your data

Tweaking data (30 minutes)

  • Lecture: Changing the types of the values for your data, fixing them, or ignoring them
  • Hands-on exercise: Tweak your data

Break (10 minutes)

Basic stats (40 minutes)

  • Lecture: The functionality that pandas provides to easily look at descriptive analytics for your data
  • Hands-on exercise: Use basic stats to gain insight from your data

Wrap-up and Q&A (10 minutes)