You are previewing A Practical Guide to Data Mining for Business and Industry.
O'Reilly logo
A Practical Guide to Data Mining for Business and Industry

Book Description

Data mining is well on its way to becoming a recognized discipline in the overlapping areas of IT, statistics, machine learning, and AI. Practical Data Mining for Business presents a user-friendly approach to data mining methods, covering the typical uses to which it is applied. The methodology is complemented by case studies to create a versatile reference book, allowing readers to look for specific methods as well as for specific applications. The book is formatted to allow statisticians, computer scientists, and economists to cross-reference from a particular application or method to sectors of interest.

Table of Contents

  1. Cover
  2. Title page
  3. Copyright page
  4. Glossary of terms
  5. Part I: Data mining concept
    1. 1 Introduction
      1. 1.1 Aims of the Book
      2. 1.2 Data Mining Context
      3. 1.3 Global Appeal
      4. 1.4 Example Datasets Used in This Book
      5. 1.5 Recipe Structure
      6. 1.6 Further Reading and Resources
    2. 2 Data mining definition
      1. 2.1 Types of Data Mining Questions
      2. 2.2 Data Mining Process
      3. 2.3 Business Task: Clarification of the Business Question behind the Problem
      4. 2.4 Data: Provision and Processing of the Required Data
      5. 2.5 Modelling: Analysis of the Data
      6. 2.6 Evaluation and Validation during the Analysis Stage
      7. 2.7 Application of Data Mining Results and Learning from the Experience
  6. Part II: Data mining Practicalities
    1. 3 All about data
      1. 3.1 Some Basics
      2. 3.2 Data Partition: Random Samples for Training, Testing and Validation
      3. 3.3 Types of Business Information Systems
      4. 3.4 Data Warehouses
      5. 3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS
      6. 3.6 Data Marts
      7. 3.7 A Typical Example from the Online Marketing Area
      8. 3.8 Unique Data Marts
      9. 3.9 Data Mart: Do’s and Don’ts
    2. 4 Data Preparation
      1. 4.1 Necessity of Data Preparation
      2. 4.2 From Small and Long to Short and Wide
      3. 4.3 Transformation of Variables
      4. 4.4 Missing Data and Imputation Strategies
      5. 4.5 Outliers
      6. 4.6 Dealing with the Vagaries of Data
      7. 4.7 Adjusting the Data Distributions
      8. 4.8 Binning
      9. 4.9 Timing Considerations
      10. 4.10 Operational Issues
    3. 5 Analytics
      1. 5.1 Introduction
      2. 5.2 Basis of Statistical Tests
      3. 5.3 Sampling
      4. 5.4 Basic Statistics for Pre-analytics
      5. 5.5 Feature Selection/Reduction of Variables
      6. 5.6 Time Series Analysis
    4. 6 Methods
      1. 6.1 Methods Overview
      2. 6.2 Supervised Learning
      3. 6.3 Multiple Linear Regression for Use When Target is Continuous
      4. 6.4 Regression When the Target is Not Continuous
      5. 6.5 Decision Trees
      6. 6.6 Neural Networks
      7. 6.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks
      8. 6.8 Unsupervised Learning
      9. 6.9 Cluster Analysis
      10. 6.10 Kohonen Networks and Self-Organising Maps
      11. 6.11 Group Purchase Methods: Association and Sequence Analysis
    5. 7 Validation and Application
      1. 7.1 Introduction to Methods for Validation
      2. 7.2 Lift and Gain Charts
      3. 7.3 Model Stability
      4. 7.4 Sensitivity Analysis
      5. 7.5 Threshold Analytics and Confusion Matrix
      6. 7.6 ROC Curves
      7. 7.7 Cross-Validation and Robustness
      8. 7.8 Model Complexity
  7. Part III: Data mining in action
    1. 8 Marketing
      1. 8.1 Recipe 1: Response Optimisation: To Find and Address the Right Number of Customers
      2. 8.2 Recipe 2: To Find the x% of Customers with the Highest Affinity to an Offer
      3. 8.3 Recipe 3: To Find the Right Number of Customers to Ignore
      4. 8.4 Recipe 4: To Find the x% of Customers with the Lowest Affinity to an Offer
      5. 8.5 Recipe 5: To Find the x% of Customers with the Highest Affinity to Buy
      6. 8.6 Recipe 6: To Find the x% of Customers with the Lowest Affinity to Buy
      7. 8.7 Recipe 7: To Find the x% of Customers with the Highest Affinity to a Single Purchase
      8. 8.8 Recipe 8: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Communication Areas
      9. 8.9 Recipe 9: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Insurance Areas
    2. 9 Intra-Customer Analysis
      1. 9.1 Recipe 10: To Find the Optimal Amount of Single Communication to Activate One Customer
      2. 9.2 Recipe 11: To Find the Optimal Communication Mix to Activate One Customer
      3. 9.3 Recipe 12: To Find and Describe Homogeneous Groups of Products
      4. 9.4 Recipe 13: To Find and Describe Groups of Customers with Homogeneous Usage
      5. 9.5 Recipe 14: To Predict the Order Size of Single Products or Product Groups
      6. 9.6 Recipe 15: Product Set Combination
      7. 9.7 Recipe 16: To Predict the Future Customer Lifetime Value of a Customer
    3. 10 Learning from a Small Testing Sample and Prediction
      1. 10.1 Recipe 17: To Predict Demographic Signs (Like Sex, Age, Education and Income)
      2. 10.2 Recipe 18: To Predict the Potential Customers of a Brand New Product or Service in Your Databases
      3. 10.3 Recipe 19: To Understand Operational Features and General Business Forecasting
    4. 11 Miscellaneous
      1. 11.1 Recipe 20: To Find Customers Who Will Potentially Churn
      2. 11.2 Recipe 21: Indirect Churn Based on a Discontinued Contract
      3. 11.3 Recipe 22: Social Media Target Group Descriptions
      4. 11.4 Recipe 23: Web Monitoring
      5. 11.5 Recipe 24: To Predict Who is Likelyto Click on a Special Banner
    5. 12 Software and Tools
      1. 12.1 List of Requirements When Choosing a Data Mining Tool
      2. 12.2 Introduction to the Idea of Fully Automated Modelling (FAM)
      3. 12.3 FAM Function
      4. 12.4 FAM Architecture
      5. 12.5 FAM Data Flows and Databases
      6. 12.6 FAM Modelling Aspects
      7. 12.7 FAM Challenges and Critical Success Factors
      8. 12.8 FAM Summary
    6. 13 Overviews
      1. 13.1 To Make Use of Official Statistics
      2. 13.2 How to Use Simple Maths to Make an Impression
      3. 13.3 Differences between Statistical Analysis and Data Mining
      4. 13.4 How to Use Data Mining in Different Industries
      5. 13.5 Future Views
  8. Bibliography
  9. Index
  10. End User License Agreement