## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

## Book Description

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You’ll also receive updates when significant changes are made, new chapters are available, and the final ebook bundle is released.

A key component of data science is statistics and machine learning, but only a small proportion of data scientists are actually trained as statisticians. This concise guide illustrates how to apply statistical concepts essential to data science, with advice on how to avoid their misuse.

Many courses and books teach basic statistics, but rarely from a data science perspective. And while many data science resources incorporate statistical methods, they typically lack a deep statistical perspective. This quick reference book bridges that gap in an accessible, readable format.

1. Preface
2. 1. Exploratory Data Analysis
1. Elements of Structured Data
2. Rectangular Data
3. Estimates of Location
4. Estimates of Variability
5. Exploring the Data Distribution
6. Exploring Binary and Categorical Data
7. Correlation
8. Exploring Two or More Variables
9. Summary
3. 2. Data and Sampling Distributions
1. Random sampling and sample bias
2. Selection bias
3. Sampling Distribution of a Statistic
4. The Bootstrap
5. Confidence intervals
6. Normal distribution
7. Long-Tailed Distributions
8. Student’s t distribution
9. Binomial distribution
10. Poisson and Related Distributions
11. Summary
4. 3. Statistical Experiments and Significance Testing
1. A-B Testing
2. Hypothesis Test
3. Resampling
4. Statistical Significance and P-values
5. t-test
6. Multiple Testing
7. Degrees of freedom
8. ANOVA
9. Chi-square test
10. Multi-arm bandit algorithm
11. Power and sample size
12. Summary
5. 4. Regression and Prediction
1. Simple Linear Regression
2. Multiple Linear Regression
3. Prediction Using Regression
4. Factor Variables in Regression
5. Interpreting the Regression Equation
6. Testing the Assumptions - Regression Diagnostics
7. Polynomial and Spline Regression
8. Summary
6. 5. Classification
1. Naive Bayes
2. Discriminant Analysis
3. Logistic regression
4. Evaluating Classification Models
5. Strategies for Imbalanced Data
6. Summary
7. 6. Statistical Machine Learning
1. K-Nearest-Neighbors (KNN)
2. Tree Models
3. Bagging and the Random Forest
4. Boosting
8. 7. Unsupervised Learning
1. Principal Components Analysis (PCA)
2. K-Means Clustering
3. Hierarchical Clustering
4. Model Based Clustering
5. Scaling and Categorical Variables
6. Summary
9. Bibliography
10. Index