Chapter 9. Microsoft Clustering

Imagine yourself as a child sitting on the floor with a bag of marbles. You undo the leather strap and let the marbles spill out onto the floor. Instantly you notice that you have four different colors—red, blue, yellow, and green. You separate the marbles by color until you have four groups, but then you notice that some of the marbles are regulars, some are shooters, and some are peewees. You decide that the peewees can stay with the regular marbles, but the shooters belong in a separate group, because only one will be used per player. You look at the organization and are happy with your groups. You have just performed a clustering operation.

Continuing with this marble scenario, when you look at the clusters again, you see that not only do you have solid color marbles, but also cats-eyes, starburst, crystals, steelies, and genuine agates. Some of your marbles are in perfect condition, but others are scuffed. Some are so chipped they don't roll straight. Now you are confused. Should you keep your simple groupings based on size and color, or should you add factors of style, material, and condition? (Most likely, you just go ahead and play marbles.)

Clustering is a simple, natural, and even automatic human operation for dealing with a small set of attributes. However, as the number of attributes grows, the problem becomes increasingly difficult and eventually impossible for the human mind to handle. It is possible for people with particular domain expertise ...

Get Data Mining with Microsoft® SQL Server® 2008 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.