48Obstacles and Maneuvers

The formulaic methods for scaled data in Parts II, III, and IV assume that sample data itself is normally distributed, or fairly close to it. And while this assumption is often warranted, sometimes it is not. The statistical scenario—unruly data is a case in point. That data harbors two nonnormal traits that are common obstacles to using the statistical analysis methods we've seen so far for scaled data.

The first obstacle, as you know, are outliers. Figure 48.2a shows the impact of one extreme outlier in a sample of size 30 (the outlier value of 1000 shows up as a little nub in the “More” slot). Figure 48.2b shows the data with the outlier removed. Notice the large impact the outlier has on the sample mean and the even larger impact it has on the sample variance that in turn will wreak havoc on various other sample statistics, confidence intervals, and significance tests.

(a) A bar graphical representation for data sample with one extreme outlier (sample mean = 53, sample variance = 32,036), where frequency is plotted on the y-axis on a scale of 0–20 and data value bins on the x-axis on a scale of 10–more. (b) A bar graphical representation for data sample with one extreme outlier (sample mean = 21, sample variance = 48), where frequency is plotted on the y-axis on a scale of 0–20 and data value bins on the x-axis on a scale of 10–70. img

Figure 48.2

Statistical ...

Get Illuminating Statistical Analysis Using Scenarios and Simulations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.