Removing outliers

So far we have seen different techniques for identifying possible outliers. What should we do after identifying them? After identifying the values ​​that are outliers in the column, you need to determine whether these values ​​are valid or invalid for the dataset.

If these are invalid values ​​due to an error in the population phase of the dataset, then we must correct them. This operation may involve the replacement of this value with a presumably valid one or the removal of the entire row. In this latter case, we must pay attention to the weight that this action can have on the whole dataset.

To replace the value 100, which seems to us an invalid value in all respects (maybe it was 10 and an extra zero was added), we can ...

Get Hands-On Machine Learning on Google Cloud Platform now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.