O'Reilly logo
  • Shobhit Bhatnagar thinks this is interesting:

Deduplicating

From

Cover of The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data

Note

Removing the duplicate data and replacing it with pointers pointing to the actual data. This reduces the storage cost and in-turn reduced infra usage and cost.