Clean, effective data analysis with Python
The Pandas library from .head() to .tail()
You’ll learn how to solving common problems in data analysis by writing clean, readable, efficient code. Pandas will be the primary tool, though integrations with other libraries like scikit-learn, statsmodels, and matplotlib will be demonstrated. The emphasis will be on gradually learning methods for massaging data into the correct form through real applications, rather than an exhaustive walk-through of pandas' API.
This course is aimed at beginner and intermediate PyData users. It covers practical topics such as:
- The basics of NumPy and its relationship to pandas
- Selecting and indexing
- Reshaping and tidy data
- Grouped operations and summarization
- Merging and joining
- Interaction with other PyData libraries (statistics and visualization)
- Some of the more specialized areas of pandas including Categoricals, time-series analysis, hierarchical indexes, chunked/out of core processing, and data pipelines.
What you'll learn-and how you can apply it
By the end of this live, hands-on, online course, you’ll understand:
- The subset of the pandas API that covers the most common problems in data wrangling
- Where pandas fits in the broader scientific python ecosystem
- How to get data in and out of pandas DataFrame
- How pandas interacts with other PyData libraries, like matplotlib, scikit-learn, and statsmodels
- Fundamental data wrangling techniques like reshaping, groupby, and filtering
And you'll be able to:
- Compute simple and sophisticated group-wise summary statistics
- Write efficient and idiomatic pandas code
- Clean, reshape, and join datasets in preparation for statistical learning or visualization
This training course is for you because...
- You are a data analyst who needs to preprocess messy data before feeding it to a machine learning algorithm or visualization library
- Some experience with python and its built-in data structures.
- Experience with NumPy and vectorized computation will be helpful, but not required, to get the most out of the training.
About your instructor
The timeframes are only estimates and may vary according to how the class is progressing
Each section is associated with a Jupyter notebook. The instructor will guide the discussion by introducing the large themes of each topic. Each notebook contains many small exercises (and solutions) for checking your understanding as we progress through the notebooks.
Segment 1: Introduction (5 min)
Introduction to your instructor and pandas
Segment 2: Setup and Jupyter Introduction (5 min)
Clone the repository, follow the setup instructions
Segment 3: Indexing (40 min)
How to select subsets of your data
Break (5 min)
Segment 4: Alignment and Operations (25 min)
How pandas uses row-labels to do alignment
Segment 5: Tidy Data (30 min)
Tidy data in pandas
Segment 6: Day 1 Wrap (5 min)
Review of the topics covered, preview of the next day
Segment 7: Groupby (35 min)
Segment 8: Visualization (35 min)
Plotting with matplotlib, pandas, seaborn, and Altair
Break (5 min)
Segment 9: Performance (35 min)
How to avoid writing slow pandas code
Segment 10: Timeseries (15 min)
Brief introduction to manipulating timeseries data
Segment 14: Integrations (15 min)
Brief examples of how pandas plugs into statsmodels and scikit-learn
Segment 15: Day 2 Wrap (5 min)
Review of topics covered, further resources.