CHAPTER 4 UNDERSTANDING RELATIONSHIPS

4.1 OVERVIEW

A critical step in making sense of data is an understanding of the relationships between variables. For example, is there a relationship between interest rates and inflation or education level and income? The existence of an association between variables does not imply that one variable causes another. These relationships or associations can be established through an examination of different summary tables and data visualizations as well as calculations that measure the strength and confidence in the relationship. The following sections examine a number of ways to understand relationships between pairs of variables through data visualizations, tables that summarize the data, and specific calculated metrics. Each approach is driven by how the variables have been categorized such as the scale on which they are measured. The use of data visualizations is important as it takes advantage of the human visual system's ability to recognize complex patterns in what is seen graphically.

4.2 VISUALIZING RELATIONSHIPS BETWEEN VARIABLES

4.2.1 Scatterplots

Scatterplots can be used to identify whether a relationship exists between two continuous variables measured on the ratio or interval scales. The two variables are plotted on the x-and y-axis. Each point displayed on the scatterplot is a single observation. The position of the point ...

Get Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.