Chapter 3. Data Management

You may wonder what a chapter on data management is doing in a book about statistics. It’s really very simple: statistics is about analyzing data, and the validity of the statistical result depends in large part on the validity of the data analyzed. So if you will be working with statistics, you need to know something about data management, whether you will be performing the necessary tasks yourself or delegating them to someone else. Oddly enough, data management is often ignored in conventional statistics classes, as well as in many offices and labs: professors and project managers alike sometimes seem to believe that data will magically organize itself without the need for human intervention. However, people who work with data on a daily basis are more likely to subscribe to the 80/20 rule, which says that you spend 80% of your time preparing the data for analysis, and only 20% of your time actually analyzing it. Additionally, even people who understand the need for data management often act as if everyone was born knowing how to do it, unlike matters such as doing linear algebra or riding a bicycle, which actually need to be learned. This is nonsense: data management is a skill that can be learned like any other, and while it is certainly possible to learn it on the job, a.k.a. The School of Hard Knocks, there’s no reason not to take advantage of the collective wisdom of those who have gone before you.

The quality of analysis depends on the quality ...

Get Statistics in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.