O'Reilly logo
live online training icon Live Online training

Understanding statistical inference with Python

enter image description here

A computational approach to estimation and hypothesis testing

Allen Downey

Do you know the difference between standard deviation and standard error? What about the importance of p-values or confidence intervals? Most people don’t really understand these concepts even after taking several statistics classes. The problem is that these courses focus on mathematical methods, burying the concepts under a mountain of details.

Join expert Allen Downey for a computational approach to statistical inference that uses random simulations instead of mathematical equations. Drawing on his book Think Stats, his courses at Olin College, and his blog, Probably Overthinking It, Allen walks you through using Python to implement simple statistical experiments and shares examples using real-world data to answer the three fundamental questions of statistical inference: how to use data to estimate the size of whatever effect you observe, how to quantify the precision of that estimate, and how to decide whether the apparent effect might be due to chance.

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • The goals of statistical inference: estimating the size of an effect, quantifying the precision of the estimate, and testing hypotheses
  • The limitations and hazards of statistical inference, including sampling bias, measurement error, and some causes of false-positive hypothesis tests

And you’ll be able to:

  • Use computational tools to compute effect sizes, confidence intervals, standard errors, and p-values
  • Choose appropriate statistics to measure effect size and test hypotheses
  • Communicate statistical results to both technical and nontechnical audiences

This training course is for you because...

  • You're a scientist designing experiments, interpreting data, and presenting results.
  • You're an engineer developing statistical analysis pipelines that turn data into actionable information.
  • You're a data scientist with a only vague memory of past statistics classes who needs to explain results clearly to collaborators and clients.
  • You want to better understand statistical methods and implications.

Prerequisites

  • A working knowledge of Python and basic statistics concepts (mean, standard deviation, median, etc.)

  • All of the coding exercises in the course will be hosted on JupyterHub, and we'll send the URL out at the start of class. Purely browser-based, no installations required.

Recommended preparation:

Losing your Loops: Fast Numerical Computing with NumPy (video)

"Classes and Methods" (chapter in Think Python)

About your instructor

  • Allen Downey is a professor of Computer Science at Olin College and the author of a series of free, open-source textbooks related to software and data science, including Think Python, Think Bayes, and Think Complexity, published by O’Reilly Media. His blog, Probably Overthinking It, features articles on Bayesian probability and statistics. He holds a Ph.D. in computer science from U.C. Berkeley, and M.S. and B.S. degrees from MIT. He lives near Boston, MA with his wife and two daughters.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction: What is statistical inference? (10 minutes)

  • Lecture: The p-value ban and why most published research findings are false; inference example—drug testing
  • Q&A

Effect size (50 minutes)

  • Lecture: Report effect size first (everything else is secondary)
  • Hands-on exercises: Explore the difference in means, absolute and relative difference, and Cohen’s effect size; explore the difference in proportions, odds ratios, and log odds ratios
  • Q&A

Break (10 minutes)

Quantifying precision (60 minutes)

  • Lecture: Sampling bias, measurement error, and random error; sampling statistics and sampling distributions; differences in standard deviation and standard error; quantifying precision; the limitations of confidence intervals
  • Hands-on exercises: Generate sampling distributions by simulation; estimate sampling distributions by resampling; use the Resampler framework to compute the sampling distribution for Cohen’s effect size
  • Q&A

Break (10 minutes)

Hypothesis testing (50 minutes)

  • Lecture: The logic of the null hypothesis significance test (NHST); interpreting p-values; the limitations of hypothesis testing
  • Hands-on exercises: Test difference in means by permutation; use the HypothesisTest framework
  • Q&A

Other resources, wrap-up, and Q&A (20 minutes)