O'Reilly logo

Bad Data Handbook by Q. Ethan McCallum

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 19. Data Quality Analysis Demystified: Knowing When Your Data Is Good Enough

Ken Gleason and Q. Ethan McCallum

Most of the decisions we make in our personal and professional lives begin with a query. That query might be for a presentation, a research project, a business forecast or simply finding the optimal combination of shipping time and price on tube socks. There are times when we are intuitively comfortable with our data source, and/or are not overly concerned about the breadth or depth of the answers we get, for instance, when we are looking at movie reviews. Other times, you might care a little more, for example, if you are estimating your requirements for food and water for the Badwater Ultramarathon.[76] Or even for mundane things like figuring out how much of a product to make, or where your production bottlenecks are on the assembly line.

But how do we know when to care and when not to care, and about what? Should you throw away the survey data because a couple of people failed to answer certain questions? Should you blindly accept that your daily sales of widgets in Des Moines seem to quintuple on alternate Fridays? Maybe, maybe not. Much of what you (think you) know about the quality of a given set of data relies on past experience that evolves to intuition. But there are three problems with relying solely on intuition. First, intuition is good at trapping obvious outliers (errors that stick out visibly) but likely won’t do much to track more subtle issues. Second, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required