10. Scattered Data

Henry Ford once remarked that “Half of my advertising budget is wasted, but I don’t know which half.” Similarly, we could say that half of our data are noise, but we don’t know which half. There is always signal (useful information) and noise (irrelevant variation) in a data set. Not only can we not always tell which is which, but signal and noise also change from task to task and from user to user.

In our pursuit of simplification and the good form, it would be ideal if we could reduce an entire data distribution to a single indicator like the mean, with an acceptable level of information loss. We can find such variables, but they’ll lack relevant variation and they’ll ultimately be useless.

However, we may be luckier than ...

Get Data at Work: Best practices for creating effective charts and information graphics in Microsoft® Excel® now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.