Data Analysis with Open Source Tools

Chapter 11. Intermezzo: Mythbusting—Bigfoot, Least Squares, and All That

EVERYBODY HAS HEARD OF BIGFOOT, THE MYSTICAL FIGURE THAT LIVES IN THE WOODS, BUT NOBODY HAS EVER actually seen him. Similarly, there are some concepts from basic statistics that everybody has heard of but that—like Bigfoot—always remain a little shrouded in mystery. Here, we take a look at three of them: the average of averages, the mystical standard deviation, and the ever-popular least squares.

How to Average Averages

Recently, someone approached me with the following question: given the numbers in Table 11-1, what number should be entered in the lower-right corner? Just adding up the individual defect rates per item and dividing by 3 (in effect, averaging them) did not seem right—if only because it would come out to about 0.75, which is pretty high when one considers that most of the units produced (100 out of 103) are not actually defective. The specific question asked was: “Should I weight the individual rates somehow?”

This situation comes up frequently but is not always recognized: we have a set of rates (or averages) and would like to summarize them into an overall rate (or overall average). The problem is that the naive way of doing so (namely, to add up the individual rates and then to divide by the number of rates) will give an incorrect result. However, this is rarely noticed unless the numbers involved are as extreme as in the present example.

Table 11-1. Defect rates: what value should go into the ...

Get Data Analysis with Open Source Tools now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Analysis with Open Source Tools by Philipp K. Janert

Chapter 11. Intermezzo: Mythbusting—Bigfoot, Least Squares, and All That

How to Average Averages

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly