356 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
10.1 Data mining in an operational warehouse
environment
Although there are many definitions of data mining, we use the definition
provided in the IBM Redbooks publication InfoSphere Warehouse: A Robust
Infrastructure for Business Intelligence, SG24-7813:
... we define data mining as the process of discovering and modeling
non-trivial, potentially valuable patterns and relationships hidden in data. Data
mining is discovery driven, meaning that these techniques can find and
characterize relationships that are unknown and therefore cannot be
expressed explicitly.
1
The goal of data mining is to discover “potentially valuable” and “non-trivial”
patterns and data relationships that exist within enterprise data assets but are
not obvious and are non-trivial to locate and exploit. Because they are non-trivial
and difficult to discover, a complex body of
modeling techniques have been
developed over the past decades to use advanced statistical and mathematical
operations to search for, discover, and characterize these useful patterns and
relationships.
Many questions might be raised from this discussion so far:
򐂰 What are these discovery and modeling processes?
򐂰 How do they work?
򐂰 What kinds of relationships and patterns do they detect?
򐂰 Why are these relationships and patterns valuable?
We seek to answer these questions and more in the following subsections. We
also characterize the discussion in the context of operational data warehousing
and operational BI because that is the focus of this book. We begin with an
overview (or more accurately, a review) of data mining concepts and technology,
and then we move into a discussion of the data mining scenarios in an
operational warehouse environment. From this, the high value of data mining is
clear. In the rest of the chapter, we explore the various techniques that are
available for InfoSphere Warehouse 10.1 to implement and deploy a data mining
solution.
10.1.1 Data mining overview
The concepts, technologies, and techniques that encompass the subject of data
mining is a vast topic that itself requires at least a full-sized book to explain
thoroughly. An in-depth study of data mining is useful but beyond the scope of
1
InfoSphere Warehouse: A Robust Infrastructure for Business Intelligence, SG24-7813.
Chapter 10. Techniques for data mining in an operational warehouse 357
this particular document. There are several IBM Redbooks that address data
mining in an InfoSphere Warehouse environment in detail, including most
recently InfoSphere Warehouse: A Robust Infrastructure for Business
Intelligence, SG24-7813.
For completeness, we provide a summary of the concepts and processes here.
Types of data mining
There are several data mining techniques, and they can be broadly classified into
one of two categories. It is possible for a single technique fall into both categories
depending on the role the technique is playing at a given time.
򐂰 Discovery techniques
򐂰 Predictive techniques
We discuss each of these in turn.
Discovery data mining and techniques
Discovery methods are designed to find patterns in the historical data without
any prior knowledge of what those patterns might be. Thus, we must
discover the
patterns organically. Three discovery mining methods supported by InfoSphere
Warehouse directly are listed here:
򐂰 Clustering
The clustering algorithm groups data records into segments by how similar
they are based on attributes “of interest.” For example, we can choose to
profile our clients by grouping them according to similar purchasing behavior
or demographic attributes to, therefore, introduce more narrowly defined
targeted marketing to specific customer
segments. A clustering method can
discover non-obvious client groupings based on analysis of these
demographic and behavior attributes.
򐂰 Associations
The association method identifies links (or
associations) among the data
records of individual transactions such as a single retail purchase of multiple
items, for example, in a grocery store. A form of
link analysis, the associations
method is commonly used for
market basket analysis which finds what retail
items tend to be purchased together. This knowledge enables retailers to
tailor their sales and promotions according to understood buyer patterns.
򐂰 Sequences
Another form of
link analysis, this method finds sequential patterns across
multiple transactions, as in a
sequence of customer events or purchases.
Knowledge of sequential client patterns or behavior can allow retailers, for
example, to tailor the shopping experience for individual customers, such as

Get Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.