Variance

This technique is useful for data types that are numeric in nature. It can also be applied to Date/Time values.

This follows a statistical approach where we try to algorithmically vary the input data by a factor of +/- X percent. The value of X purely depends on the analysis we are doing and shouldn’t have an overall impact on understanding the business figures.

Let's see a few examples:

Input Data

Output Data

Method

Explanation

100

110

Fixed variance

Increase by 10%

-100

90

Fixed variance

Decrease by 10%

1-Jan-2000

1-Feb-2000

Fixed variance

Add 1 month

1-Aug-2000

1-Jul-2000

Fixed variance

Reduce by 1 month

100

101

Dynamic variance

1% to 5% increase or decrease

100

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.