Chapter 6. The Distributed Data Warehouse

Most organizations build and maintain a single centralized data warehouse environment. This setup makes sense for many reasons:

  • The data in the warehouse is integrated across the corporation, and an integrated view is used only at headquarters.

  • The corporation operates on a centralized business model.

  • The volume of data in the data warehouse is such that a single centralized repository of data makes sense.

  • Even if data could be integrated, if it were dispersed across multiple local sites, it would be cumbersome to access.

In short, the politics, the economics, and the technology greatly favor a single centralized data warehouse. Still, in a few cases, a distributed data warehouse makes sense, as you'll see in this chapter.

Types of Distributed Data Warehouses

The three types of distributed data warehouses are as follows:

  • Business is distributed geographically or over multiple, differing product lines. In this case, there is what can be called a local data warehouse and a global data warehouse. The local data warehouse represents data and processing at a remote site, and the global data warehouse represents that part of the business that is integrated across the business.

  • The data warehouse environment will hold a lot of data, and the volume of data will be distributed over multiple processors. Logically there is a single data warehouse, but physically there are many data warehouses that are all tightly related but reside on separate processors. This ...

Get Building the Data Warehouse now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.