50What Rank Sums Happen Just by Chance?

Let's experiment with using rank numbers for statistical inference. We'll simulate a generalized version of the following scenario: We have annual income data for 100 random people, a group of 50 males and a group of 50 females. The data distribution is very unruly, with skew and outliers too, so we sort and rank them 1–100 by annual income. We'll presume that there is no population difference between males and females in annual income rankings; this is the null hypothesis.

As we saw in the previous chapter, summing all the ranks of 1–100 gives 5050. If the males and females in the sample have identical rankings, then they will both have a rank sum of half that: 2525. If they are different, one will have a rank sum less than 2525 by a certain amount and the other will have a rank sum greater than 2525 by that same amount. So, we only need to look at one of the rank sums. We can sum the actual ranks of males, for example, and see how far away it is from 2525. How far away is far enough away to reject the null hypothesis?

The simulation we'll perform for this scenario draws a sample of 50 random male rank numbers from the set of possible rank values 1–100, and then sums the 50 randomly ...

Get Illuminating Statistical Analysis Using Scenarios and Simulations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.