9.9. Conclusion

Mi-Ling's goal in this analysis was to explore some of the features that JMP provides to support classification and data mining. She began by using various visualization techniques to develop an understanding of the data and relationships among the variables. Then, she used formulas and row states to partition her data into a training set, a validation set, and a test set.

Mi-Ling was interested in investigating logistic, partition, and neural net fits. Given that her goal was to learn about these platforms, she constructed models in a fairly straightforward way. She fit four models using the training data: a logistic model, a partition model, and two neural net models. The best classification, based on performance on her validation set, was obtained with a neural net model whose structure was chosen using K-fold cross-validation. We note that Mi-Ling could have taken a number of more sophisticated approaches to her modeling endeavor, had she so desired.

Among Mi-Ling's four models, the partition model had the worst performance. In our experience, single partition models tend not to perform as well as nonlinear (or linear) regression techniques when the predictors are continuous. They can be very useful when there are categorical predictors, and especially when these have many levels. Moreover, unlike neural net models and even logistic models, partition models are very intuitive and interpretable, which makes them all the more valuable for data exploration. In ...

Get Visual Six Sigma: Making Data Analysis Lean now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.