You are previewing Data Mining with Microsoft® SQL Server™ 2000 Technical Reference.
O'Reilly logo
Data Mining with Microsoft® SQL Server™ 2000 Technical Reference

Book Description

This book examines the SQL Server™ 2000 Analysis Services architecture and shows how data mining fits into its complete suite of information-extraction technologies. Then it demonstrates how to find nuggets of useful information with SQL Server.

Table of Contents

  1. Data Mining with Microsoft® SQL Server™ 2000: Technical Reference
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. Acknowledgments
    3. Introduction
      1. Who Should Use This Book
      2. What Is in This Book
    4. I. Introducing Data Mining
      1. 1. Understanding Data Mining
        1. What Is Data Mining?
        2. Why Use Data Mining?
        3. How Data Mining Is Currently Used
        4. Defining the Terms
        5. Data Mining Methodology
          1. Analyzing the Problem
          2. Extracting and Cleansing the Data
          3. Validating the Data
          4. Creating and Training the Model
          5. Querying the Data Mining Model Data
          6. Maintaining the Validity of the Data-Mining Model
        6. Overview of Microsoft Data Mining
          1. Data Mining vs. OLAP
          2. Data-Mining Models
          3. Data-Mining Algorithms
            1. Decision Trees
            2. Clustering
            3. Association
            4. Regression Analysis or Sequencing
            5. OLE DB for Data Mining
          4. Using SQL Server Syntax to Data Mine
        7. Summary
      2. 2. Microsoft SQL Server Analysis Services Architecture
        1. Introduction to OLAP
          1. MOLAP
          2. ROLAP
          3. HOLAP
        2. Server Architecture
          1. Data Mining Services Within Analysis Services
        3. Client Architecture
          1. PivotTable Service
          2. OLE DB
            1. OLE DB for OLAP
            2. OLE DB for Data Mining
          3. Decision Support Objects (DSO)
          4. Multidimensional Expressions (MDX)
          5. Prediction Joins
        4. Summary
      3. 3. Data Storage Models
        1. Why Data Mining Needs a Data Warehouse
          1. Maintaining Data Integrity
        2. Reporting Against OLTP Data Can Be Hazardous to Your Performance
        3. Data Warehousing Architecture for Data Mining
          1. Creating the Warehouse from OLTP Data
            1. Ensuring Data Quality
            2. When to Discard Data from the Warehouse
          2. Optimizing Data for Mining
            1. Data Structures
            2. Choosing Columns
              1. Input columns
              2. Target columns
              3. Key columns
              4. Value columns
            3. Columns to Avoid
            4. Selecting Rows
            5. Calculated and Derived Data
            6. Determining Data Granularity
          3. Physical Data Mining Structure
            1. Client (Single-Tier) Architecture
            2. Two-Tier Architecture
          4. Three-Tier Architecture
        4. Relational Data Warehouse
          1. Advantages of Relational Data Storage
            1. Flexibility
            2. Relational Databases Use Standards
            3. Easy Access to Lowest Level of Granularity
          2. Building Supporting Tables for Data Mining
        5. OLAP cubes
          1. How Data Mining Uses OLAP Structures
          2. Advantages of OLAP Storage
            1. Speed
            2. Easy Navigation
            3. OLAP Storage Requires Highly-Structured Dimensions
            4. Flat Tables vs. Multidimensional OLAP Tables
          3. When OLAP Is Not Appropriate for Data Mining
        6. Summary
      4. 4. Approaches to Data Mining
        1. Directed Data Mining
        2. Undirected Data Mining
          1. Data Mining vs. Statistics
            1. Population
            2. Sample
            3. Range
            4. Bias
            5. Mean
            6. Median
            7. Distribution
            8. Mode
            9. Variance
            10. Standard Deviation
            11. Correlation
            12. Regression
          2. Learning from Historical Data
            1. Influence Analysis
            2. Variation Analysis
            3. Comparison Analysis
            4. Cause and Effect Analysis
            5. Trend Analysis
            6. Deviation Analysis
          3. Predicting the Future
            1. Determining Probabilities
            2. Simulations and What-if Scenarios
        3. Training Data-Mining Models
          1. Evaluating the Models and Avoiding Errors
            1. Overfitting
              1. Pruning
              2. Chi-Squared Analysis
              3. Cross Validation
            2. Underfitting
            3. Preparing Data Models for Testing
        4. Summary
    5. II. Data-Mining Methods
      1. 5. Microsoft Decision Trees
        1. Creating the Model
          1. Analysis Manager
            1. Mushrooms Data-Mining Model
              1. Creating the Database
              2. Mining Model Wizard
              3. Select source
              4. Select case tables
              5. Select a data-mining technique
              6. Create and edit joins
              7. Select the Key column
              8. Select Input and Prediction columns
              9. Finish
            2. Relational Mining Model Editor
        2. Visualizing the Model
          1. Prediction Columns
          2. Dependency Network Browser
            1. Dependency Network Browser Helps Understand Models
            2. Dealing with Numerical Data
          3. Inside the Decision Tree Algorithm
            1. CART, CHAID, and C4.5
            2. Classification and Regression Trees (CART)
            3. Chi-Squared Automatic Interaction Detector (CHAID)
            4. C4.5
            5. How Splits Are Determined
              1. Calculating Diversity
        3. How Predictions Are Derived
          1. Navigating the Tree
            1. Problem Trees
          2. Navigation vs. Rules
          3. When to Use Decision Trees
        4. Summary
      2. 6. Creating Decision Trees with OLAP
        1. Creating the Model
          1. Select Source Type
          2. Select Source Cube and Data-Mining Technique
          3. Select Case
          4. Select Predicted Entity
            1. A Measure of the Source Cube
            2. Member Property of the Case Level
            3. Members of Another Dimension
          5. Select Training Data
          6. Select Dimension and Virtual Cube
            1. The Dimension
            2. The Virtual Cube
          7. Completing the Data-Mining Model
        2. OLAP Mining Model Editor
          1. Content Detail Pane
          2. Structure Panel
          3. Prediction Tree List
        3. Analyzing Data with the OLAP Data-Mining Model
          1. Using the Generated Virtual Cube
          2. Using the Generated Dimension
            1. What Is MDX?
        4. Summary
      3. 7. Microsoft Clustering
        1. The Search for Order
        2. Looking for Ways to Understand Data
        3. Clustering as an Undirected Data-Mining Technique
        4. How Clustering Works
          1. Overview of the Algorithm
          2. The K-Means Method Clustering Algorithm
            1. Finding the Clusters
            2. Finding the Center of the Cluster
            3. The Boundaries
          3. What Is Being Measured Exactly?
            1. Conceptual Attributes
          4. Clustering Factors
            1. Rankings
            2. Interval Values
            3. Measures
            4. Categorical Values
          5. Measuring "Closeness"
            1. Distance Between Points In Space
            2. Vector Angle Similarity
            3. The Record Overlap Problem
        5. When to Use Clustering
          1. Visualize Relationships
          2. Highlight Anomalies
          3. Create Samples for Other Data-Mining Efforts
          4. Weaknesses of Clustering
            1. Results Are Difficult to Understand
            2. Data Types Are Difficult to Compare
        6. Creating a Data-Mining Model Using Clustering
          1. Select Source Type
          2. Select the Table or Tables for Your Mining Model
          3. Select the Data-Mining Technique
          4. Edit Joins
          5. Select the Case Key Column for Your Mining Model
          6. Select the Input and Predictable Columns
            1. Finish
            2. Processing the Model
        7. Viewing the Model
          1. Organization of the Cluster Nodes
          2. Order of the Cluster Nodes
        8. Analyzing the Data
        9. Summary
    6. III. Creating Data-Mining Applications with Code
      1. 8. Using Microsoft Data Transformation Services (DTS)
        1. What Is DTS?
        2. DTS Tasks
          1. Transform
          2. Bulk Insert
          3. Data Driven Query
          4. Execute Package
            1. Transfer Error Messages
            2. Transfer Master Stored Procedures
            3. Transfer Databases
            4. Transfer Jobs
            5. Transfer Logins
            6. Copy SQL Server Objects
            7. Dynamic Properties
            8. Message Queue
            9. ActiveX Scripting
            10. Execute SQL
            11. Execute Process
            12. Send Mail
            13. FTP
            14. Analysis Services Processing Task
            15. Data-Mining Prediction Query Task
            16. Custom and Third-Party Tasks
        3. Connections
          1. Sources
            1. A Data Source Connection
            2. A File Connection
            3. A Data Link Connection
          2. Configuring a Connection
            1. Single Thread per Connection
            2. Dynamic Connection Properties
            3. Security Account Information
        4. DTS Package Workflow
          1. DTS Package Steps
          2. Precedence Constraints
            1. Unconditional
            2. On Success
            3. On Failure
            4. Using Multiple Precedence Constraints
        5. DTS Designer
          1. Opening the DTS Designer
            1. DTS Designer Work Area
          2. Saving a DTS Package
            1. Package Name
            2. Owner Password
            3. User Password
            4. Location
        6. dtsrun Utility
        7. Using DTS to Create a Data-Mining Model
          1. Preparing the SQL Server Environment
          2. Creating the Package
        8. Summary
      2. 9. Using Decision Support Objects (DSO)
        1. Scripting vs. Visual Basic
          1. The Server Object
          2. The Database Object
        2. Creating the Relational Data-Mining Model Using DSO
        3. Creating the OLAP Data-Mining Model Using DSO
          1. The DataSource Object
          2. Data-Mining Model (Decision Support Objects)
        4. Adding a New Data Source
        5. Analysis Server Roles
          1. Data-Mining Model Roles
            1. Adding a New Data-Mining Model Role
        6. Summary
      3. 10. Understanding Data-Mining Structures
        1. The Structure of the Data-Mining Model Case
          1. Data-Mining Models Look Like Tables
        2. Using Code to Browse Data-Mining Models
        3. Using the Schema Rowsets
          1. MINING_MODELS Schema Rowset
          2. MINING_COLUMNS Schema Rowset
          3. MINING_MODEL_CONTENT Schema Rowset
          4. MINING_SERVICES Schema Rowset
          5. SERVICE_PARAMETERS Schema Rowset
            1. Example Listing of Algorithms Available Using Code
          6. MODEL_CONTENT_PMML Schema Rowset
        4. Summary
      4. 11. Data Mining Using PivotTable Service
        1. Redistributing Components
        2. Installing and Registering Components
          1. File Locations
          2. Installation Registry Settings
          3. Redistribution Setup Programs
        3. Connecting to the PivotTable Service
          1. Connect to Analysis Services Using PivotTable Service
          2. Connect to Analysis Services Using HTTP
        4. Building a Local Data-Mining Model
          1. Storage of Local Mining Models
          2. SELECT INTO Statement
          3. INSERT INTO Statement
          4. OPENROWSET Syntax
          5. Nested Tables and the SHAPE Statement
        5. Using XML in Data Mining
          1. The PMML Standard
        6. Summary
      5. 12. Data-Mining Queries
        1. Components of a Prediction Query
          1. The Basic Prediction Query
          2. Specifying the Test Case Source
            1. Singleton Queries
          3. Specifying Columns
          4. The PREDICTION JOIN Clause
          5. Using Functions as Columns
          6. Using Tabular Values as Columns
          7. The WHERE Clause
          8. Prediction Functions
            1. Predict
            2. PredictProbability
            3. PredictSupport
            4. PredictVariance
            5. PredictStdev
            6. PredictProbabilityVariance
            7. PredictProbabilityStdev
            8. PredictHistogram
            9. TopCount
            10. TopSum
            11. TopPercent
            12. RangeMin
            13. RangeMid
            14. RangeMax
            15. PredictScore
            16. PredictNodeId
        2. Prediction Queries with Clustering Models
          1. Cluster
          2. ClusterProbability
          3. ClusterDistance
        3. Using DTS to Run Prediction Queries
        4. Summary
    7. A. Regression Analysis
      1. What Is Regression Analysis?
      2. Predicting Continuous Attributes: An Example
        1. Correlation
      3. The Regression Line
        1. Finding the Slope
        2. Finding the Intercept
        3. The Regression Coefficient
      4. Using Regression Analysis to Make Predictions
      5. Analyzing the Accuracy of the Regression Line
      6. Using OLAP to Create Regression Models
      7. Applying Regression to a Relational Database
        1. Regression with SQL Server vs. Regression Using OLAP
      8. Using Visual Basic to Perform Regression Analysis
      9. Creating the Models
        1. Using a Table
        2. Using PMML
      10. Summary
    8. Glossary of Data-Mining Terms
    9. B. About the Author
    10. Index
    11. About the Author
    12. SPECIAL OFFER: Upgrade this ebook with O’Reilly