Chapter 12. The Really Large Data Warehouse

Data warehouses entail large amounts of data. As testimony to the volumes of data that come with data warehouses, consider the vocabulary of capacity planning. In the days before data warehousing, capacity was measured in kilobytes (KB), megabytes (MB), and occasionally, gigabytes (GB). But data warehouses appeared and soon there were new words in the vocabulary, including "hundreds of gigabytes," "terabytes," and even "petabytes." Thus, volumes of data have increased in multiples of orders of magnitude with the advent of data warehouses. Figure 12-1 shows the increase as it has typically occurred.

The growth of the data warehouse over time is explosive.

Figure 12.1. The growth of the data warehouse over time is explosive.

As shown in Figure 12-1, at the beginning of a data warehouse, there were a few gigabytes of data. These volumes were not surprising and caused no one any particular anguish. Then, a short time passed and there were hundreds of gigabytes of data. This volume was a minor concern, but not a large one. Time passed and soon there were several terabytes (trillions of bytes) of data. Now some genuine concerns were starting to be raised. There was the budget to think about. There was database design and administration. There was response time. And more time passed and the organization woke up to 10 or more terabytes of data. The worries of yesterday became the crises of today. Data kept ...

Get Building the Data Warehouse now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.