Substitution

Substitution is the process of replacing portions of data with computed data. It can be mathematically be defined as:

Where x is the source and y is the output from this function.

In order to choose the correct substitution mechanism, we need to understand how this data is going to be used, the target audience, and the data flow environment as well. Let's look at the various available substitution mechanisms.

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.