O'Reilly logo

Fundamentals of Predictive Analytics with JMP, Second Edition by B. D. McCullough, Ron Klimberg

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3: Dirty Data

Introduction

Data Set

Error Detection

Outlier Detection

Approach 1

Approach 2

Missing Values

Statistical Assumptions of Patterns of Missing

Conventional Correction Methods

The JMP Approach

Example Using JMP

General First Steps on Receipt of a Data Set

Exercises

Introduction

Dirty data refers to fields or variables within a data set that are erroneous. Possible errors could range from spelling mistakes, incorrect values associated with fields or variables, or simply missing or blank values. Most real-world data sets have some degree of dirty data. As shown in Figure 3.1, dealing with dirty data is one of the multivariate data discovery steps.

In some situations (for example, when the original data source can be obtained), ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required