## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

## Book Description

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.

• Understand how data science fits in your organization—and how you can use it for competitive advantage
• Treat data as a business asset that requires careful investment if you’re to gain real value
• Approach business problems data-analytically, using the data-mining process to gather good data in the most appropriate way
• Learn general concepts for actually extracting knowledge from data
• Apply data science principles when interviewing data science job candidates

1. Praise
2. Dedication
3. Preface
4. 1. Introduction: Data-Analytic Thinking
5. 2. Business Problems and Data Science Solutions
2. Supervised Versus Unsupervised Methods
3. Data Mining and Its Results
4. The Data Mining Process
5. Implications for Managing the Data Science Team
6. Other Analytics Techniques and Technologies
7. Summary
6. 3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
1. Models, Induction, and Prediction
2. Supervised Segmentation
3. Visualizing Segmentations
4. Trees as Sets of Rules
5. Probability Estimation
6. Example: Addressing the Churn Problem with Tree Induction
7. Summary
7. 4. Fitting a Model to Data
1. Classification via Mathematical Functions
2. Regression via Mathematical Functions
3. Class Probability Estimation and Logistic “Regression”
4. Example: Logistic Regression versus Tree Induction
5. Nonlinear Functions, Support Vector Machines, and Neural Networks
6. Summary
8. 5. Overfitting and Its Avoidance
1. Generalization
2. Overfitting
3. Overfitting Examined
4. Example: Overfitting Linear Functions
5. * Example: Why Is Overfitting Bad?
6. From Holdout Evaluation to Cross-Validation
7. The Churn Dataset Revisited
8. Learning Curves
9. Overfitting Avoidance and Complexity Control
10. Summary
9. 6. Similarity, Neighbors, and Clusters
1. Similarity and Distance
2. Nearest-Neighbor Reasoning
1. Example: Whiskey Analytics
2. Nearest Neighbors for Predictive Modeling
3. How Many Neighbors and How Much Influence?
4. Geometric Interpretation, Overfitting, and Complexity Control
5. Issues with Nearest-Neighbor Methods
3. Some Important Technical Details Relating to Similarities and Neighbors
4. Clustering
1. Example: Whiskey Analytics Revisited
2. Hierarchical Clustering
3. Nearest Neighbors Revisited: Clustering Around Centroids
4. Example: Clustering Business News Stories
5. Understanding the Results of Clustering
6. * Using Supervised Learning to Generate Cluster Descriptions
5. Stepping Back: Solving a Business Problem Versus Data Exploration
6. Summary
10. 7. Decision Analytic Thinking I: What Is a Good Model?
1. Evaluating Classifiers
2. Generalizing Beyond Classification
3. A Key Analytical Framework: Expected Value
1. Using Expected Value to Frame Classifier Use
2. Using Expected Value to Frame Classifier Evaluation
4. Evaluation, Baseline Performance, and Implications for Investments in Data
5. Summary
11. 8. Visualizing Model Performance
12. 9. Evidence and Probabilities
2. Combining Evidence Probabilistically
3. Applying Bayes’ Rule to Data Science
4. A Model of Evidence “Lift”
5. Example: Evidence Lifts from Facebook “Likes”
6. Summary
13. 10. Representing and Mining Text
1. Why Text Is Important
2. Why Text Is Difficult
3. Representation
4. Example: Jazz Musicians
5. * The Relationship of IDF to Entropy
6. Beyond Bag of Words
7. Example: Mining News Stories to Predict Stock Price Movement
8. Summary
14. 11. Decision Analytic Thinking II: Toward Analytical Engineering
1. Targeting the Best Prospects for a Charity Mailing
2. Our Churn Example Revisited with Even More Sophistication
15. 12. Other Data Science Tasks and Techniques
1. Co-occurrences and Associations: Finding Items That Go Together
2. Profiling: Finding Typical Behavior
3. Link Prediction and Social Recommendation
4. Data Reduction, Latent Information, and Movie Recommendation
5. Bias, Variance, and Ensemble Methods
6. Data-Driven Causal Explanation and a Viral Marketing Example
7. Summary
16. 13. Data Science and Business Strategy
1. Thinking Data-Analytically, Redux
2. Achieving Competitive Advantage with Data Science
3. Sustaining Competitive Advantage with Data Science
4. Attracting and Nurturing Data Scientists and Their Teams
5. Examine Data Science Case Studies
6. Be Ready to Accept Creative Ideas from Any Source
7. Be Ready to Evaluate Proposals for Data Science Projects
8. A Firm’s Data Science Maturity
17. 14. Conclusion
18. A. Proposal Review Guide
19. B. Another Sample Proposal
1. Scenario and Proposal
20. Glossary
21. C. Bibliography
22. Index
23. Colophon