CHAPTER 6 BUILDING MODELS FROM DATA

6.1 OVERVIEW

In Chapter 4, we looked at different ways to understand and quantify relationships between variables. Is there a relationship between age and cholesterol levels? Do patients in a clinical trial taking a drug have improved outcomes versus patients taking a placebo? Formal ways to describe, encode, and test if and how one or more variables relate to others is to build and evaluate models from the data. These models describe important relationships in the data, including the strength and direction—positive or negative—of the relation. The models can encode linear and nonlinear relationships in the data. They can also be used to confirm a hypothesis about relationships. All these uses help to summarize and understand the data. However, one of the most widely used applications of a model is for making predictions. For example, a data set of historical purchases along with customer geographical and demographic data (such as the customer's age, location, salary, and so on) could be collected and used to generate a model that encodes what type of products clients purchase. Once the model is built, it could be used to identify from a list of potential clients those most likely to make a purchase, and customers on this prioritized list could be targeted with marketing material or other promotions.

In this chapter, we will review how models can be built from data sets. A model is usually built to predict values for a specific variable. ...

Get Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.