Chapter 20Case studies

20.1 Introduction

The previous chapters of this book contain R code examples that illustrate the operation mechanics of data mining algorithms. While some of the datasets used for these illustrations and demonstrations can be considered realistic (even if small), their scope is by far too limited to adequately represent the typical data mining process. This is supposed to be partially compensated by this chapter, containing case studies that somewhat better portray the path from data to models that has to be traversed in a real-world data mining project.

With the above being said, the case studies remain limited with respect to the scope and depth in comparison to what would be usually done in reality. To make them easily reproducible, they all use publicly available datasets that can be loaded to R with single function calls, are relatively clean, and require very limited preprocessing. To keep the computational requirements within the reach of even aged and low-performant personal computers, computationally intensive operations are avoided. No more than two or three modeling algorithms are used in each study with none or limited parameter tuning, hopefully providing an encouragement for the reader to continue with other algorithms and parameter setups. Little or no statistical exploration of attribute distribution and relationships is included. Some methods of analysis that could be applied to all the datasets are only demonstrated for one of them to ...

Get Data Mining Algorithms: Explained Using R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.