## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

## Book Description

Turning raw data into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.

1. Data Analysis with Open Source Tools
2. Dedication
3. SPECIAL OFFER: Upgrade this ebook with O’Reilly
4. A Note Regarding Supplemental Files
5. Preface
6. 1. Introduction
7. I. Graphics: Looking at Data
1. 2. A Single Variable: Shape and Distribution
1. Dot and Jitter Plots
2. Histograms and Kernel Density Estimates
3. The Cumulative Distribution Function
4. Rank-Order Plots and Lift Charts
5. Only When Appropriate: Summary Statistics and Box Plots
6. Workshop: NumPy
2. 3. Two Variables: Establishing Relationships
1. Scatter Plots
2. Conquering Noise: Smoothing
3. Logarithmic Plots
4. Banking
5. Linear Regression and All That
6. Showing What’s Important
7. Graphical Analysis and Presentation Graphics
8. Workshop: matplotlib
3. 4. Time As a Variable: Time-Series Analysis
1. Examples
3. Smoothing
4. Don’t Overlook the Obvious!
5. The Correlation Function
6. Optional: Filters and Convolutions
7. Workshop: scipy.signal
4. 5. More Than Two Variables: Graphical Multivariate Analysis
1. False-Color Plots
2. A Lot at a Glance: Multiplots
3. Composition Problems
4. Novel Plot Types
5. Interactive Explorations
6. Workshop: Tools for Multivariate Graphics
5. 6. Intermezzo: A Data Analysis Session
8. II. Analytics: Modeling Data
1. 7. Guesstimation and the Back of the Envelope
1. Principles of Guesstimation
1. Estimating Sizes
2. Establishing Relationships
3. Working with Numbers
4. More Examples
5. Things I Know
2. How Good Are Those Numbers?
3. Optional: A Closer Look at Perturbation Theory and Error Propagation
4. Workshop: The Gnu Scientific Library (GSL)
2. 8. Models from Scaling Arguments
1. Models
2. Arguments from Scale
3. Mean-Field Approximations
4. Common Time-Evolution Scenarios
5. Case Study: How Many Servers Are Best?
6. Why Modeling?
7. Workshop: Sage
3. 9. Arguments from Probability Models
1. The Binomial Distribution and Bernoulli Trials
2. The Gaussian Distribution and the Central Limit Theorem
3. Power-Law Distributions and Non-Normal Statistics
4. Other Distributions
5. Optional: Case Study—Unique Visitors over Time
6. Workshop: Power-Law Distributions
4. 10. What You Really Need to Know About Classical Statistics
1. Genesis
2. Statistics Defined
3. Statistics Explained
4. Controlled Experiments Versus Observational Studies
5. Optional: Bayesian Statistics—The Other Point of View
6. Workshop: R
5. 11. Intermezzo: Mythbusting—Bigfoot, Least Squares, and All That
1. How to Average Averages
2. The Standard Deviation
3. Least Squares
9. III. Computation: Mining Data
1. 12. Simulations
1. A Warm-Up Question
2. Monte Carlo Simulations
3. Resampling Methods
4. Workshop: Discrete Event Simulations with SimPy
2. 13. Finding Clusters
1. What Constitutes a Cluster?
2. Distance and Similarity Measures
1. Common Distance and Similarity Measures
3. Clustering Methods
4. Pre- and Postprocessing
5. Other Thoughts
6. A Special Case: Market Basket Analysis
7. A Word of Warning
8. Workshop: Pycluster and the C Clustering Library
3. 14. Seeing the Forest for the Trees: Finding Important Attributes
1. Principal Component Analysis
1. Motivation
2. Optional: Theory
3. Interpretation
4. Computation
5. Practical Points
2. Visual Techniques
3. Kohonen Maps
4. Workshop: PCA with R
4. 15. Intermezzo: When More Is Different
10. IV. Applications: Using Data
1. 16. Reporting, Business Intelligence, and Dashboards
2. Corporate Metrics and Dashboards
3. Data Quality Issues
4. Workshop: Berkeley DB and SQLite
2. 17. Financial Calculations and Modeling
1. The Time Value of Money
2. Uncertainty in Planning and Opportunity Costs
3. Cost Concepts and Depreciation
4. Should You Care?
5. Is This All That Matters?
6. Workshop: The Newsvendor Problem
3. 18. Predictive Analytics
1. Topics in Predictive Analytics
2. Some Classification Terminology
3. Algorithms for Classification
4. The Process
5. The Secret Sauce
6. The Nature of Statistical Learning
7. Workshop: Two Do-It-Yourself Classifiers
4. 19. Epilogue: Facts Are Not Reality
11. A. Programming Environments for Scientific Computation and Data Analysis
1. Software Tools
2. A Catalog of Scientific Software
1. Matlab
2. R
3. Python
5. Other Players
6. Recommendations
12. B. Results from Calculus
1. Common Functions
2. Calculus
3. Useful Tricks
4. Notation and Basic Math
2. Elementary Algebra
3. Working with Fractions
4. Sets, Sequences, and Series
5. Special Symbols
6. The Greek Alphabet
5. Where to Go from Here
13. C. Working with Data
1. Sources for Data
2. Cleaning and Conditioning
3. Sampling
4. Data File Formats
5. The Care and Feeding of Your Data Zoo
6. Skills
7. Terminology