Building models with and without outliers

The Anomaly Modeling node can automatically identify and remove outliers. Why not always remove outliers? Even when the data is examined closely, it can be difficult to decide whether any cases should be regarded as outliers and, if so, which. Even when the data miner feels confident about this, the internal or external client may not agree.

Some types of analysis are not affected much by outliers, for example, the calculation of a median. But many widely used modeling methods can be strongly influenced by the presence of outliers. A linear regression model can be shifted significantly by a single outlier in the data.

What are the risks? A model that is affected by an outlier may frequently predict values ...

Get IBM SPSS Modeler Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.