Chapter 5. Data Preparation – Integrate and Format

In this chapter, we will cover:

  • Speeding up a merge with caching and optimization settings
  • Merging a lookup table
  • Shuffle-down (nonstandard aggregation)
  • Cartesian product merge using key-less merge by key
  • Multiplying out using Cartesian product merge, user source, and derive dummy
  • Changing large numbers of variable names without scripting
  • Parsing nonstandard dates
  • Parsing and performing a conversion on a complex stream
  • Sequence processing

Introduction

This set of recipes contains tricks and shortcuts for tasks that most analysts would anticipate as central to data preparation. Two subtasks are addressed, integrate and format. The first four recipes involve aspects of integration and the last two involve ...

Get IBM SPSS Modeler Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.