In the previous chapter, we defined some useful statistical functions to compute mean and standard deviation and normalize a value. We can use these functions to locate outliers in our trip data. What we can do is apply the mean() and stdev() functions to the distance value in each leg of a trip to get the population mean and standard deviation.
We can then use the z() function to compute a normalized value for each leg. If the normalized value is more than 3, the data is extremely far from the mean. If we reject these outliers, we have a more uniform set of data that's less likely to harbor reporting or measurement errors.
The following is how we can tackle this:
from stats import mean, stdev, zdist_data ...