CHAPTER 2 DESCRIBING DATA

2.1 OVERVIEW

The starting point for data analysis is a data table (often referred to as a data set) which contains the measured or collected data values represented as numbers or text. The data in these tables are called raw before they have been transformed or modified. These data values can be measurements of a patient's weight (such as 150 lb, 175 lb, and so on) or they can be different industrial sectors (such as the “telecommunications industry,” “energy industry,” and so on) used to categorize a company. A data table lists the different items over which the data has been collected or measured, such as different patients or specific companies. In these tables, information considered interesting is shown for different attributes. The individual items are usually shown as rows in a data table and the different attributes shown as columns. This chapter examines ways in which individual attributes can be described and summarized: the scales on which they are measured, how to describe their center as well as the variation using descriptive statistical approaches, and how to make statements about these attributes using inferential statistical methods, such as confidence intervals or hypothesis tests.

2.2 OBSERVATIONS AND VARIABLES

All disciplines collect data about items that are important to that field. Medical researchers collect data on patients, the automotive industry on cars, and retail companies on transactions. These items are organized into ...

Get Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.