O'Reilly logo

Python for Data Science For Dummies by Luca Massaron, John Paul Mueller

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 16

Detecting Outliers in Data

In This Chapter

arrow Understanding what is an outlier

arrow Distinguishing between extreme values and novelties

arrow Using simple statistics for catching outliers

arrow Finding out most tricky outliers by advanced techniques

Errors happen when you least expect, and that’s also true in regard to your data. In addition, data errors are difficult to spot, especially when your dataset contains many variables of different types and scale (a high-dimensionality data structure).

Data errors can take a number of forms. For example, the values may be systematically missing on certain variables, erroneous numbers could appear here and there, and the data could include outliers. A red flag has to be raised when the following characteristics are met:

  • Missing values on certain groups of cases or variables imply that some specific cause is generating the error.
  • Erroneous values depend on how the application has produced or manipulated the data. For instance, you need to know whether the application has obtained data from a measurement instrument. External conditions and human ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required