Understanding statistical inference with Python
A computational approach to estimation and hypothesis testing
Do you know the difference between standard deviation and standard error? What about the interpretation of pvalues and confidence intervals? Most people don’t really understand these concepts even after taking several statistics classes. The problem is that these courses focus on mathematical methods, burying the concepts under a mountain of details.
Join expert Allen Downey for a computational approach to statistical inference that uses random simulations instead of mathematical equations. Drawing on his book Think Stats, his courses at Olin College, and his blog, Probably Overthinking It, Allen walks you through using Python to implement simple statistical experiments and shares examples using realworld data to answer the three fundamental questions of statistical inference: how to use data to estimate the size of whatever effect you observe, how to quantify the precision of that estimate, and how to decide whether the apparent effect might be due to chance.
What you'll learnand how you can apply it
By the end of this live, online course, you’ll understand:
 The goals of statistical inference: estimating the size of an effect, quantifying the precision of the estimate, and testing hypotheses
 The limitations and hazards of statistical inference, including sampling bias, measurement error, and some causes of falsepositive hypothesis tests
And you’ll be able to:
 Use computational tools to compute effect sizes, confidence intervals, standard errors, and pvalues
 Choose appropriate statistics to measure effect size and test hypotheses
 Communicate statistical results to both technical and nontechnical audiences
This training course is for you because...
 You're a scientist designing experiments, interpreting data, and presenting results.
 You're an engineer developing statistical analysis pipelines that turn data into actionable information.
 You're a data scientist with a only vague memory of past statistics classes who needs to explain results clearly to collaborators and clients.
 You want to better understand statistical methods and implications.
Prerequisites

A working knowledge of Python and basic statistics concepts (mean, standard deviation, median, etc.)

All of the coding exercises in the course will be hosted on JupyterHub, and we'll send the URL out at the start of class. Purely browserbased, no installations required.
System Test:
To test whether you will be able to run the jupyter notebooks in your upcoming training, please:
Navigate here: https://notebook.oreillyjupyterhub.com (This is the link to the test site)
 Sign in with your Safari credentials
 Click "start my server"

Click on "notebook .ipynb"

Run each of the code cells: click the cell then either press Shift+Return or click the triangle in the top menu

There may be a few second delay, but you should eventually see the graphs. If you do not, this probably means that your firewall is blocking JupyterHub's websockets. Please turn off your company VPN or speak with your system administrator to allow.
Recommended preparation:
Losing your Loops: Fast Numerical Computing with NumPy (video)
"Classes and Methods" (chapter in Think Python)
About your instructor

Allen Downey is a professor of Computer Science at Olin College and the author of a series of free, opensource textbooks related to software and data science, including Think Python, Think Bayes, and Think Complexity, published by O’Reilly Media. His blog, Probably Overthinking It, features articles on Bayesian probability and statistics. He holds a Ph.D. in computer science from U.C. Berkeley, and M.S. and B.S. degrees from MIT. He lives near Boston, MA with his wife and two daughters.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Introduction: What is statistical inference? (10 minutes)
 Lecture: The pvalue ban and why most published research findings are false; inference example—drug testing
 Q&A
Effect size (50 minutes)
 Lecture: Report effect size first (everything else is secondary)
 Handson exercises: Explore the difference in means, absolute and relative difference, and Cohen’s effect size; explore the difference in proportions, odds ratios, and log odds ratios
 Q&A
Break (10 minutes)
Quantifying precision (60 minutes)
 Lecture: Sampling bias, measurement error, and random error; sampling statistics and sampling distributions; differences in standard deviation and standard error; quantifying precision; the limitations of confidence intervals
 Handson exercises: Generate sampling distributions by simulation; estimate sampling distributions by resampling; use the Resampler framework to compute the sampling distribution for Cohen’s effect size
 Q&A
Break (10 minutes)
Hypothesis testing (50 minutes)
 Lecture: The logic of the null hypothesis significance test (NHST); interpreting pvalues; the limitations of hypothesis testing
 Handson exercises: Test difference in means by permutation; use the HypothesisTest framework
 Q&A
Other resources, wrapup, and Q&A (20 minutes)