When working with data mining, it is useful to understand mining algorithm basics and when to apply each algorithm. Table 57.2 summarizes common algorithms used for the problem categories presented in this chapter's introduction.
|Problem Type||Primary Algorithms|
|Segmentation||Clustering, Sequence Clustering|
|Classification||Decision Trees, Naive Bayes, Neural Network, Logistic Regression|
|Association||Association Rules, Decision Trees|
|Estimation||Decision Trees, Linear Regression, Logistic Regression, Neural Network|
|Sequence Analysis||Sequence Clustering|
These are guidelines only because not every data mining problem falls into these categories. In addition, there may be other algorithms that you can apply to the listed problem types.
The decision trees algorithm is the most accurate for many problems. It operates by building a decision tree beginning with the All node, corresponding to all the training cases, as shown in Figure 57.3. Then an attribute is chosen to split those cases into groups, which then separate based on another attribute, and so on. The goal is to generate leaf nodes with a single predictable outcome. For example, if the goal is to identify who will purchase a bike, then leaf nodes should contain cases that are either bike buyers or not bike buyers, but no combinations (or as close to that goal as possible).