The Data Mining Process

A traditional practice in data mining is to train a data mining model using existing data for which an outcome is already known and then use that model to predict the outcome of new data. This requires several steps, only some of which happen within Analysis Services:

  • Business and data understanding: Understand the important questions and the available data to answer those questions. Insights gained must be relevant to business goals to be of use. Data must be of acceptable quality and relevance to obtain reliable answers.
  • Prepare data: Preparing data can be a simple or difficult task depending on the current state of the data. Some of the tasks to consider include the following:
    • Eliminate rows of low data quality. The measure of quality is domain-specific. Eliminate values outside of expected norms, or failing any test that proves the row describes an impossible or highly improbable case.
    • Eliminate duplicates, invalid values, or inconsistent values.
    • Denormalize data by creating views to create a single “case” table.
    • Erratic time series data may benefit from smoothing to remove dramatic variations.
    • Derived attributes, such as profit, can be useful in the modeling process.
  • Model: You build Analysis Services models by first defining a data mining structure that specifies the tables to use as input. Then, add data mining models (different algorithms) to the structure. Use the training data to simultaneously train all the models within the structure.
  • Evaluate ...

Get Microsoft SQL Server 2012 Bible now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.