Part II Data mining Practicalities
- 3 All about data
- 3.1 Some Basics
- 3.2 Data Partition: Random Samples for Training, Testing and Validation
- 3.3 Types of Business Information Systems
- 3.4 Data Warehouses
- 3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS
- 3.6 Data Marts
- 3.7 A Typical Example from the Online Marketing Area
- 3.8 Unique Data Marts
- 3.9 Data Mart: Do’s and Don’ts
- 4 Data Preparation
- 4.1 Necessity of Data Preparation
- 4.2 From Small and Long to Short and Wide
- 4.3 Transformation of Variables
- 4.4 Missing Data and Imputation Strategies
- 4.5 Outliers
- 4.6 Dealing with the Vagaries of Data
- 4.7 Adjusting the Data Distributions
- 4.8 Binning
- 4.9 Timing Considerations
- 4.10 Operational Issues
- 5 Analytics
- 5.1 Introduction
- 5.2 Basis of Statistical Tests
- 5.3 Sampling
- 5.4 Basic Statistics for Pre-analytics
- 5.5 Feature Selection/Reduction of Variables
- 5.6 Time Series Analysis
- 6 Methods
- 6.1 Methods Overview
- 6.2 Supervised Learning
- 6.3 Multiple Linear Regression for use when Target is Continuous
- 6.4 Regression when the Target is not Continuous
- 6.5 Decision Trees
- 6.6 Neural Networks
- 6.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks
- 6.8 Unsupervised Learning
- 6.9 Cluster Analysis
- 6.10 Kohonen Networks and Self-Organising Maps
- 6.11 Group Purchase Methods: Association and Sequence Analysis
Get A Practical Guide to Data Mining for Business and Industry now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.