O'Reilly logo
live online training icon Live Online training

Beginning Data Analysis with Python and Jupyter

Create reproducible data analyses with Python and Jupyter Notebooks

Alex Galea

Data science is becoming increasingly popular as industries continue to value its importance. Recent advancements in open source software have made this discipline accessible to a wide range of people. Python is a popular choice for most data scientists owing to its ease of use and versatile nature. On the other hand, Jupyter Notebook acts as an add-on tool—a virtual playground—that allows you to create and share live codes, equations, visualizations, text. Both these tools offer abstractions to programmatically intense algorithms, allowing you to better conceptualize the problems you are faced with and reduce the amount of programming required for the solutions.

The goal of this training course is to help you get the most out of Python and Jupyter Notebook to complete the trickiest of tasks in data science quickly and effortlessly. By touching on a variety of topics within the discipline, you’ll be exposed to many interesting examples with real-world-like data.

The training course starts with the basics of Jupyter, which will be the backbone of the course. After familiarizing yourselves with its standard features, you'll look at an example of it in practice with your first analysis. The next lesson dives right into predictive analytics, where you'll implement multiple classification algorithms. Finally, you'll look at data collection techniques. You'll also learn how web data can be acquired with scraping techniques and via APIs, and then briefly explore interactive visualizations

What you'll learn-and how you can apply it

  • Identify areas of investigation within a data set
  • Develop a plan for doing data science
  • Define exploratory analysis
  • Prepare data for modeling
  • Implement predictive analytics
  • Collect data with web scraping
  • Explore various data visualization techniques

This training course is for you because...

You are interested in data analysis. The topics covered are relevant to a variety of job descriptions across a large range of industries.

Prerequisites

For the best experience in this training course, you should have knowledge of programming fundamentals and some experience with Python. In particular, having some familiarity with:

  • Pandas
  • Matplotlib

Materials, Software, Downloads, or Supplemental Content needed in advance:

  • Please download the repository prior to the training session here: https://github.com/agalea91/Beginning-Data-Science-with-Jupyter

  • Python 3.5+

  • Anaconda 4.3+

Python libraries included with Anaconda installation:

  • matplotlib 2.1.0+
  • ipython 6.1.0+
  • requests 2.18.4+
  • beautifulsoup4 4.6.0+
  • numpy 1.13.1+
  • pandas 0.20.3+
  • scikit-learn 0.19.0+
  • seaborn 0.8.0+
  • bokeh 0.12.10+

Python libraries that require manual installation optional:

  • mlxtend
  • version_information
  • ipython-sql
  • pdir2
  • graphviz

Hardware requirements

  • Processor: Intel i5 (or equivalent)
  • Memory: 8 GB RAM
  • Hard disk: 10 GB
  • Solid Internet connection

About your instructor

  • Alex Galea is Data analyst and a Python expert. He has been doing data analysis professionally since graduating with an M.Sc in Physics at the University of Guelph in Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies.

    More recently, Alex has been doing web-data analytics, where Python has continued to play a large part in his work. He frequently blogs about work and personal projects, which are generally data centric and usually involve Python and Jupyter Notebooks

    Alex has recently published his first courseware with Packt on the same topic.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

DAY 1

Lesson 1: Jupyter Fundamentals (2h 45m)

  • Lesson Introduction
  • Topic A: Basic Functionality and Features
  • Break (15min)
  • Topic B: Our First Analysis – The Boston Housing Dataset
  • Activity B: Building a Third-Order Polynomial Model
  • Summary
  • Practice Questions

DAY 2

Lesson 2: Data Cleaning and Advanced Machine Learning (2h)

  • Lesson Introduction
  • Topic A: Preparing to Train a Predictive Model
  • Activity A: Preparing to Train a Predictive Model for the Employee-Retention Problem
  • Break (30m)
  • Topic B: Training Classification Models
  • Summary
  • Practice Questions
  • Break

Lesson 3: Web Scraping and Interactive Visualizations (1h 45m)

  • Lesson Introduction
  • Topic A: Scraping Web Page Data
  • Activity A: Web Scraping with Jupyter Notebook
  • Break (15m)
  • Topic B: Interactive Visualizations
  • Activity B: Exploring Data with Interactive Visualizations
  • Summary
  • Practice Questions