Chapter 11 SPSS Statistics versus SPSS Modeler: Can I Be a Data Miner Using SPSS Statistics?

In this chapter, I will attempt to answer three questions:

  • What is “data mining,” and how is it different from statistics?
  • What is the SPSS Modeler data mining workbench?
  • Is it possible to perform data mining tasks effectively in SPSS Statistics?

Our discussion focuses on two case studies, which will help us address these questions. One case study has a continuous dependent variable (“target” as Modeler users would call it), and the other has a binary dependent variable. Along the way, we will learn a number of tricks and tips. As you may have guessed, it is indeed possible to do data mining effectively in SPSS Statistics, but it is not always obvious how to perform all of the tasks, or even what the required tasks are.

What Is Data Mining?

My own definition of data mining has evolved slightly over the years, but this one has served me well:

Data mining uses historical data, accumulated during the normal course of doing business, and involves selecting, preparing, and analyzing the data, finding (and confirming) previously unknown patterns, building predictive models, and deploying the models on current data.

Each element of the definition is worth elaborating:

  • Historical data: Data mining needs data for which the outcome of interest has been achieved. The resulting model is then applied to newer data for which the outcome is currently unknown, but can be predicted.
  • Normal course ...

Get SPSS Statistics for Data Analysis and Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.