Chapter 14. Data Mining with SQL Server Integration Services

In a typical data mining project, the most resource-consuming step is data preparation. Creating and tuning mining models may represent only 20 percent of the total project effort. However, before you create these models, your data must be in the right format. Data preparation consists of multiple steps, including data gathering, cleaning, and transformation. You can prepare the data using SQL scripts, but there is a better tool for this: SQL Server Integration Services (SSIS).

SSIS provides a workflow environment for you to build data transformation packages. You can extract data from different data sources and perform a sequence of operations on the data. Many such operations are predefined and provided as components in the SSIS Toolbox. You can enrich this set with your own custom operations. After transforming your data, you can use it to process a data mining model, or execute prediction queries directly inside the SSIS environment.

This chapter begins with an overview of the SSIS components and continues by explaining how to perform data mining tasks in an SSIS environment.

In this chapter, you will learn about the following:

  • The basic concepts of SSIS, including control flow and data flow

  • Performing data mining–related transformations and tasks in SSIS

  • The text mining solution based on Term Extraction and Term Lookup transformations

Examples, datasets, and projects for this chapter are included in Chapter14.zip, which ...

Get Data Mining with Microsoft® SQL Server® 2008 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.