O'Reilly logo
live online training icon Live Online training

Explore, visualize, and predict using pandas and Jupyter

enter image description here

Learn to import, explore, and tweak your data

Matt Harrison

The pandas library is very popular among data scientists, quants, Excel junkies, and Python developers because it allows you to perform data ingestion, exporting, transformation, and visualization with ease. But if you're only familiar with Python, pandas may present some challenges. Since pandas is inspired by NumPy, its syntax conventions can be confusing to Python developers.

Join Matt Harrison to learn how to interact with pandas via the REPL in the Jupyter Notebook. Over two three-hour sessions, you'll learn how to use Python for exploratory analysis, as Matt walks you through loading data, inspecting it, tweaking it, visualizing it, and using it for some basic machine learning.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • How to use Jupyter to interact with Python scripts
  • How pandas can make life easier for data scientists or programmers

And you’ll be able to:

  • Import, explore, and tweak your data with pandas
  • Understand how to get help when you get stuck
  • Use pandas to debug your analytics

This training course is for you because...

  • You're a data scientist with experience in R or SAS, and you want to learn about pandas and the Python ecosystem.
  • You're a developer with programming experience in Python who wants to better understand pandas.

Prerequisites

  • All of the coding exercises in the course will be hosted on JupyterHub, and we'll send the URL out at the start of class. Purely browser-based, no installations required.

  • If you would like to download files locally, a machine with Python (3.6+), Anaconda and Jupyter installed.

Recommended preparation:

Introduction to pandas for Developers (video)

Python for Data Analysis, chapters 2 and 5–10 (book)

About your instructor

  • Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Day 1

Introduction and setup (5 minutes)

Jupyter (10 minutes)
- Lecture: A brief review of Jupyter features and concepts

pandas (10 minutes)
- Lecture: A brief review of pandas features and basic data structures

Loading data (25 minutes)

  • Lecture: Ingesting data from the web and CSV files; options for manipulation during loading
  • Hands-on exercise: Loading data

Break (10 minutes)

Inspecting data (30 minutes)

  • Lecture: Inspecting data and characterizing it
  • Hands-on exercise: Inspecting data

Tweaking data (30 minutes)

  • Lecture: Changing the types of the values, fixing them, or ignoring them
  • Hands-on exercise: Tweaking data

Break (10 minutes)

Basic stats (40 minutes)

  • Lecture: Using pandas to easily look at descriptive analytics for your data
  • Hands-on exercise: Basic stats

Homework:
- Find a dataset that interests you or is relevant to your work; load that data and try to get some basic statistics about it.

Day 2

Plotting (25 minutes)

  • Lecture: Using pandas's integrated plotting functionality with the Jupyter Notebook to visually inspect your data
  • Hands-on exercise: Plotting

Filtering (25 minutes)

  • Lecture: Digging into your data with pandas
  • Hands-on exercise: Filtering

Break (10 minutes)

Dealing with NaN (25 minutes)

  • Lecture: Examining and dealing with missing values
  • Hands-on exercise: Dealing with NaN

Grouping (25 minutes)

  • Lecture: Advanced pandas features for grouping data and aggregating and returning the results
  • Hands-on exercise: Grouping

Break (10 minutes)

Pivoting (25 minutes)

  • Lecture: Creating pivot tables programmatically; combining pivot tables with grouping to easily summarize your data
  • Hands-on exercise: Pivoting

Machine learning (25 minutes)

  • Lecture: Using scikit-learn to do machine learning; using pandas to transform your data
  • Hands-on exercise: Machine learning

Stacking (25 minutes)

  • Lecture: Stacking with pandas; how stacking can enable easy plotting of multiple variables
  • Hands-on exercise: Stacking