Let us assume that a data set has been summarized in the form of an absolute frequency distribution involving classes of a variable X and absolute class frequencies fj (see Chapter 2). This type of data set appears in Table 3.A.1. Quite often one is faced with analyzing a set of observations presented in this format. (For instance, some data sets contain proprietary information and the owner will only agree to present the data in summary form and not release the individual observation values. Or maybe the data set is just too large to have all of its observations printed in some report or document.) Given that the individual observations lose their identity in the grouping process, can we still find the mean, median, mode, standard deviation, and quantiles of X? We can, but not by using the formulas presented earlier in this chapter. In fact, we can only get “approximations” to the mean, median, and so on. However, as we shall soon see, these approximations are quite good. In what follows, we shall assume that we are dealing with a sample of size n.
|Classes of X||fj|
To approximate , let us use the formula
where mj is the class mark (midpoint) of the jth class and ...