You are previewing Practical Statistics for Data Scientists.
O'Reilly logo
Practical Statistics for Data Scientists

Book Description

A key component of data science is statistics and machine learning, but only a small proportion of data scientists are actually trained as statisticians. This concise guide illustrates how to apply statistical concepts essential to data science, with advice on how to avoid their misuse.

Many courses and books teach basic statistics, but rarely from a data science perspective. And while many data science resources incorporate statistical methods, they typically lack a deep statistical perspective. This quick reference book bridges that gap in an accessible, readable format.

Table of Contents

  1. Preface
    1. What to Expect
    2. Conventions Used in This Book
    3. Using Code Examples
    4. Safari® Books Online
    5. How to Contact Us
    6. Acknowledgments
  2. 1. Exploratory Data Analysis
    1. Elements of Structured Data
      1. Further Reading
    2. Rectangular Data
      1. Data Frames and Indexes
      2. Graph Data
      3. Further Reading
    3. Estimates of Location
      1. Mean
      2. Median and Robust Estimates
      3. Example: Location Estimates of Population and Murder Rates
      4. Further Reading
    4. Estimates of Variability
      1. Standard Deviation and Related Estimates
      2. Estimates Based on Percentiles
      3. Example: Variability Estimates of State Population
      4. Further Reading
    5. Exploring the Data Distribution
      1. Percentiles and Boxplots
      2. Frequency Table and Histograms
      3. Density Estimates
      4. Further reading
    6. Exploring Binary and Categorical Data
      1. Mode
      2. Expected Value
      3. Further Reading
    7. Correlation
      1. Scatterplots
    8. Exploring Two or More Variables
      1. Hexagonal Binning and Contours (plotting numeric vs. numeric)
      2. Two Categorical Variables
      3. Categorical and Numeric Data
      4. Visualizing Multiple Variables
      5. Further Reading
    9. Conclusion
  3. 2. Data and Sampling Distributions
    1. Random sampling and sample bias
      1. Bias
      2. Random Selection
      3. Size Versus Quality
      4. Sample Mean Versus Population Mean
      5. Further Reading
    2. Selection bias
      1. Regression to the mean
      2. Further Reading
    3. Sampling Distribution of a Statistic
      1. Central Limit Theorem
      2. Standard error
      3. Further Reading
    4. The bootstrap
      1. Resampling versus bootstrapping
      2. Further Reading
    5. Confidence intervals
      1. Further reading
    6. Normal distribution
      1. Standard Normal and QQ-Plots
    7. Long-Tailed Distributions
      1. Further Reading
    8. Student’s t distribution
      1. Further Reading
    9. Binomial distribution
      1. Further Reading
    10. Poisson and Related Distributions
      1. Poisson Distributions
      2. Exponential distribution
      3. Inference
      4. Weibull distribution
      5. Further reading
    11. Summary