Chapter 9

Preparing Data

IN THIS CHAPTER

Documenting your business objectives

Processing your data

Sampling your data

Transforming your data

Extracting features

Selecting features

The roadmap to building a successful predictive model involves defining business objectives, preparing the data, and then building and deploying the model. This chapter delves into data preparation, which involves

  • Acquiring the data
  • Exploring the data
  • Cleaning the data
  • Selecting variables of interest
  • Generating derived variables
  • Extracting, loading, and transforming the data
  • Sampling the data into training and test datasets

Data is a four-letter word. It's amazing that such a small word can describe trillions of gigabytes of information: customer names, addresses, products, discounted versus original prices, store codes, times of purchase, supplier locations, run rates for print advertising, the color of your delivery vans. And that's just for openers. Data is, or can be, literally everything.

Not every source or type of data will be relevant to the business question you're trying to answer. Predictive analytics models are built from multiple data sources, and one of the first critical steps is to determine which sources to include in your model. If you're trying to determine (for example) whether customers who subscribe to e-magazines in the spring are more likely to purchase hardcover print books in the fall, you may decide to omit the January paperback sales records. Then you have to vet the specific ...

Get Predictive Analytics For Dummies, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.