CHAPTER 7

CREATE ORDER IN DATA

Too little attention is given to the need for statistical control, or to put it more pertinently, since statistical control (randomness) is so rarely found, too little attention is given to the interpretation of data that arise from conditions not in statistical control. (W. E. Deming, 1940)

This chapter is concerned with techniques for creating order in data. Such techniques are helpful as first steps when one is confronted with the interpretation of inhomogeneous data, or, to use Deming’s words, with the “interpretation of data that arise from conditions not in statistical control”.

The methods I shall describe are not geared towards giving quantitative statistical results (such as confidence levels), but rather towards providing qualitative intuitive insights with the help of interpretable graphical layouts. The figures reproduced here were chosen to illustrate such layouts. Most of them (those with square frames) belong to the category of exploration graphs (see Section 2.6.2) and were produced by a Postscript hardcopy facility.

7.1 GENERAL CONSIDERATIONS

Assume you have a large pile of data without obvious structure. Most likely it will be heterogeneous. The standard first data analytic step, an unaided preliminary inspection of the raw data, almost inevitably will be confusing. Prior to any analysis and interpretation we must create some order. The common underlying idea is to arrange the data in such a way that items that are “similar” in ...

Get Data Analysis: What Can Be Learned From the Past 50 Years now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.