9.1 SUMMARY OF PROCESS

Exploratory data analysis and data mining is a process involving defining the problem, collecting and preparing the data, and implementing the analysis. Once completed and evaluated, the project should be delivered to the consumer concerned by the information. Following a process has many advantages including avoiding common pitfalls in analyzing data and ensuring that the project meets expectations. This book has described the process in four steps:

  1. Problem definition: Prior to any analysis, the problem to be solved should be clearly defined and related to one or more business objectives. Describing the deliverables will focus the team on delivering the solution and provides correct expectations to other parties interested in the outcome of the project. A multidisciplinary team is best suited to solve these problems driven by a project leader. A plan for the project should be developed, covering the objectives and deliverables along with a timeline and a budget. An analysis of the relationship between the cost of the project and the benefit derived for the business can form a basis for a go/no-go decision for the project.
  2. Data preparation: The quality of the data is the most important aspect that influences the quality of the results from the analysis. The data should be carefully collected, integrated, characterized, and prepared for analysis. Data preparation includes cleaning the variables to ensure consistent naming and removing potential errors. Eliminating ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.