7BELIEVE IT OR DON’T: USING OUTLIER DETECTION TO FIND THE WEIRDEST OF THE WEIRD

Did you hear about the man who stole a GPS, but ended up having to call 911 when he got lost? How about the murder of nine college students that was recently blamed on the Yeti (the Russian Bigfoot)? Or the California school kids who made a fifty-foot long peanut butter and jelly sandwich in less than three minutes?

On any given day, you can search the Internet and find stories like these—true reports that are stranger than fiction. Many news outlets carry weird news stories, and a surprising number of these stories seem to come from the state of Florida. In fact, a recent Google search on “weird news Florida” revealed over 47,000,000 hits. That’s a lot of strange. And from sewer-surfing alligators to whale-wrangling nudists, the sunshine state has it all.

Outliers are the Florida of the data analysis world. These strange observations sit at the extremes, far away from the rest of your data. And like the story about a twelve-foot python caught wrapping itself around an unsuspecting woman’s toilet, outliers can leave you slightly disturbed, wondering what might’ve happened had they not been found. In this chapter, News of the Weird stories will be studied, and outlier detection will be used to find the weirdest of the weird.

THE WORLD OF THE WEIRD

If you’ve read the last few chapters, you’ve already run across outliers. You know they’re extreme values that sit far away from the center of a dataset. ...

Get Beyond Basic Statistics: Tips, Tricks, and Techniques Every Data Analyst Should Know now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.