Finding outliers in data

Outliers are the values that, compared to others, are particularly extreme (a value clearly distant from the other available observations.). Outliers are a problem because they tend to distort data analysis results, in particular in descriptive statistics and correlations. These should be identified in the data cleaning phase, but can also be dealt in the next step of data analysis. Outliers can be univariate when they have an extreme value for a single variable, or multivariate when they have an unusual combination of values on a number of variables.

Outliers are the extreme values of a distribution that are characterized by being extremely high or extremely low compared to the rest of the distribution, and thus ...

Get Regression Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.