4 Data Preparation

4.1 Necessity of Data Preparation

Having obtained useful data, it now needs to be prepared for analysis. It is not unusual to have the data stored at quite a detailed level in a data warehouse. But to get relevant, reliable and repeatable results out of the analyses, transformation and aggregation of the data is necessary. The type of aggregation has a major impact on the final result. It is unlikely that data mining algorithms will find hidden patterns without prior data preparation. Even if the user doing the data mining is not able to do the transformations and aggregations, it is important for the user to define the necessary steps and make sure someone else does them, for example, colleagues in the IT department. Most of the time, it is also useful to apply domain knowledge in the way the data preparation is done, for example, taking advantage of knowledge ...

Get A Practical Guide to Data Mining for Business and Industry now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.