Using filter() to identify outliers
In the previous chapter, we defined some useful statistical functions to compute mean and standard deviation and normalize a value. We can use these functions to locate outliers in our trip data. What we can do is apply the mean()
and
stdev()
functions to the distance value in each leg
of a trip to get the population mean and standard deviation.
We can then use the z()
function to compute a normalized value for each leg
. If the normalized value is more than 3, the data is extremely far from the mean. If we reject this outliers, we have a more uniform set of data that's less likely to harbor reporting or measurement errors.
The following is how we can tackle this:
from stats import mean, stdev, z dist_data = list(map(dist, ...
Get Functional Python Programming now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.