Chapter 10. Incorporating Data Mining

We dig up diamonds by the score A thousand rubies, sometimes more

From Snow White by Walt Disney Company, music by Frank Churchill, words by Larry Morey, ©1938

Data mining is not a single topic; it's a loosely related collection of tools, algorithms, techniques, and processes. This makes it a difficult subject area to tackle, especially in a single chapter. However, we must tackle it for two main reasons: First, data mining offers the potential of huge business impact; and second, SQL Server 2005 includes a suite of data mining tools as part of the product. In short, high value, low cost—the motivation is obvious.

The first part of this chapter sets the context for data mining. We begin with a brief definition of data mining and an overview of the business motivation for using it. We then look at the Microsoft data mining architecture and environment provided as part of SQL Server 2005, including a brief description of the data mining service, the algorithms provided, and the kinds of problems for which they might be appropriate. We next present a high-level data mining process. The process breaks into three phases: business, mining, and operations. The business phase involves identifying business opportunities and understanding the data resources. The data mining phase is a highly iterative and exploratory process whose goal is to identify the best model possible, given the time and resource constraints. Once you identify the best model, ...

Get The Microsoft® Data Warehouse Toolkit: With SQL Server™ 2005 and the Microsoft® Business Intelligence Toolset now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.