You are previewing IBM SPSS Modeler Cookbook.
O'Reilly logo
IBM SPSS Modeler Cookbook

Book Description

If you’ve already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

  • Go beyond mere insight and build models than you can deploy in the day to day running of your business

  • Save time and effort while getting more value from your data than ever before

  • Loaded with detailed step-by-step examples that show you exactly how it’s done by the best in the business

  • In Detail

    IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork.

    IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art.

    Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace.

    Go beyond the basics and get the full power of your data mining workbench with this practical guide.

    Table of Contents

    1. IBM SPSS Modeler Cookbook
      1. Table of Contents
      2. IBM SPSS Modeler Cookbook
      3. Credits
      4. Foreword
      5. About the Authors
      6. About the Reviewers
      7. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
          3. Instant Updates on New Packt Books
      8. Preface
        1. What is CRISP-DM?
        2. Data mining is a business process
        3. The IBM SPSS Modeler workbench
          1. A brief history of the Clementine workbench
        4. Historical introduction to scripting
        5. What this book covers
        6. Who this book is for
        7. Conventions
        8. Reader feedback
        9. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      9. 1. Data Understanding
        1. Introduction
        2. Using an empty aggregate to evaluate sample size
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. A modified version
          5. See also
        3. Evaluating the need to sample from the initial data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        4. Using CHAID stumps when interviewing an SME
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Using a single cluster K-means as an alternative to anomaly detection
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Using an @NULL multiple Derive to explore missing data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Creating an Outlier report to give to SMEs
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Detecting potential model instability early using the Partition node and Feature Selection node
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
      10. 2. Data Preparation – Select
        1. Introduction
        2. Using the Feature Selection node creatively to remove or decapitate perfect predictors
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        3. Running a Statistics node on anti-join to evaluate the potential missing data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Evaluating the use of sampling for speed
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        5. Removing redundant variables using correlation matrices
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        6. Selecting variables using the CHAID Modeling node
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        7. Selecting variables using the Means node
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        8. Selecting variables using single-antecedent Association Rules
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      11. 3. Data Preparation – Clean
        1. Introduction
        2. Binning scale variables to address missing data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Using a full data model/partial data model approach to address missing data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        4. Imputing in-stream mean or median
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        5. Imputing missing values randomly from uniform or normal distributions
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        6. Using random imputation to match a variable's distribution
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        7. Searching for similar records using a Neural Network for inexact matching
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        8. Using neuro-fuzzy searching to find similar names
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        9. Producing longer Soundex codes
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      12. 4. Data Preparation – Construct
        1. Introduction
        2. Building transformations with multiple Derive nodes
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        3. Calculating and comparing conversion rates
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        4. Grouping categorical values
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        5. Transforming high skew and kurtosis variables with a multiple Derive node
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Creating flag variables for aggregation
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        7. Using Association Rules for interaction detection/feature creation
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        8. Creating time-aligned cohorts
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
      13. 5. Data Preparation – Integrate and Format
        1. Introduction
        2. Speeding up merge with caching and optimization settings
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Merging a lookup table
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Shuffle-down (nonstandard aggregation)
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        5. Cartesian product merge using key-less merge by key
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        6. Multiplying out using Cartesian product merge, user source, and derive dummy
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        7. Changing large numbers of variable names without scripting
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        8. Parsing nonstandard dates
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Nesting functions into one Derive node
            2. Performing clean downstream of a calculation using a Filter node
            3. Using parameters instead of constants in calculations
          5. See also
        9. Parsing and performing a conversion on a complex stream
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Sequence processing
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      14. 6. Selecting and Building a Model
        1. Introduction
        2. Evaluating balancing with Auto Classifier
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Building models with and without outliers
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Using Neural Network for Feature Selection
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        5. Creating a bootstrap sample
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        6. Creating bagged logistic regression models
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        7. Using KNN to match similar cases
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Using Auto Classifier to tune models
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Next-Best-Offer for large datasets
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      15. 7. Modeling – Assessment, Evaluation, Deployment, and Monitoring
        1. Introduction
        2. How (and why) to validate as well as test
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Using classification trees to explore the predictions of a Neural Network
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Correcting a confusion matrix for an imbalanced target variable by incorporating priors
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        5. Using aggregate to write cluster centers to Excel for conditional formatting
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Creating a classification tree financial summary using aggregate and an Excel Export node
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Reformatting data for reporting with a Transpose node
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        8. Changing formatting of fields in a Table node
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        9. Combining generated filters
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      16. 8. CLEM Scripting
        1. Introduction
          1. CLEM scripting best practices
          2. CLEM scripting shortcomings
        2. Building iterative Neural Network forecasts
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Script section 1
          4. There's more...
        3. Quantifying variable importance with Monte Carlo simulation
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Script section 1
            2. Script section 2
          4. There's more...
        4. Implementing champion/challenger model management
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Script section 1
            2. Script section 2
          4. There's more...
        5. Detecting outliers with the jackknife method
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Script section 1
            2. Script section 2
            3. Script section 3
          4. There's more...
        6. Optimizing K-means cluster solutions
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Script section 1
            2. Script section 2
            3. Script section 3
            4. Script section 4
          4. There's more...
        7. Automating time series forecasts
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Script section 1
            2. Script section 2
          4. There's more...
        8. Automating HTML reports and graphs
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Script section 1
            2. Script section 2
            3. Script section 3
          4. There's more...
        9. Rolling your own modeling algorithm – Weibull analysis
          1. Getting ready
          2. How to do it...
          3. How it works...
            1. Script section 1
          4. There's more...
      17. A. Business Understanding
        1. Introduction
          1. What decisions are you trying to make using data?
        2. Define business objectives by Tom Khabaza
          1. The importance of business objectives in data mining
          2. Defining the business objectives of a data mining project
            1. Understanding the goals of the business
            2. Understanding the objectives of your client
            3. Connecting specific objectives to analytical results
            4. Specifying your data mining goals
        3. Assessing the situation by Meta Brown
          1. Taking inventory of resources
          2. Reviewing requirements, assumptions, and constraints
          3. Identifying risks and defining contingencies
          4. Defining terminology
          5. Evaluating costs and benefits
        4. Translating your business objective into a data mining objective by Dean Abbott
          1. The key to the translation – specifying target variables
          2. Data mining success criteria – measuring how good the models actually are
            1. Success criteria for classification
            2. Success criteria for estimation
            3. Other customized success criteria
        5. Produce a project plan – ensuring a realistic timeline by Keith McCormick
          1. Business understanding
          2. Data understanding
          3. Data preparation
          4. Modeling
          5. Evaluation
          6. Deployment
      18. Index