You are previewing Building the Data Warehouse.
O'Reilly logo
Building the Data Warehouse

Book Description

  • The new edition of the classic bestseller that launched the data warehousing industry covers new approaches and technologies, many of which have been pioneered by Inmon himself

  • In addition to explaining the fundamentals of data warehouse systems, the book covers new topics such as methods for handling unstructured data in a data warehouse and storing data across multiple storage media

  • Discusses the pros and cons of relational versus multidimensional design and how to measure return on investment in planning data warehouse projects

  • Covers advanced topics, including data monitoring and testing

  • Although the book includes an extra 100 pages worth of valuable content, the price has actually been reduced from $65 to $55

Table of Contents

  1. Copyright
  2. Credits
  3. About the Author
  4. Preface for the Second Edition
  5. Acknowledgments
  6. 1. Evolution of Decision Support Systems
    1. 1.1. The Evolution
      1. 1.1.1. The Advent of DASD
      2. 1.1.2. PC/4GL Technology
      3. 1.1.3. Enter the Extract Program
      4. 1.1.4. The Spider Web
    2. 1.2. Problems with the Naturally Evolving Architecture
      1. 1.2.1. Lack of Data Credibility
      2. 1.2.2. Problems with Productivity
      3. 1.2.3. From Data to Information
      4. 1.2.4. A Change in Approach
      5. 1.2.5. The Architected Environment
      6. 1.2.6. Data Integration in the Architected Environment
      7. 1.2.7. Who Is the User?
    3. 1.3. The Development Life Cycle
    4. 1.4. Patterns of Hardware Utilization
    5. 1.5. Setting the Stage for Re-engineering
    6. 1.6. Monitoring the Data Warehouse Environment
    7. 1.7. Summary
  7. 2. The Data Warehouse Environment
    1. 2.1. The Structure of the Data Warehouse
    2. 2.2. Subject Orientation
    3. 2.3. Day 1 to Day n Phenomenon
    4. 2.4. Granularity
      1. 2.4.1. The Benefits of Granularity
      2. 2.4.2. An Example of Granularity
      3. 2.4.3. Dual Levels of Granularity
    5. 2.5. Exploration and Data Mining
    6. 2.6. Living Sample Database
    7. 2.7. Partitioning as a Design Approach
      1. 2.7.1. Partitioning of Data
    8. 2.8. Structuring Data in the Data Warehouse
    9. 2.9. Auditing and the Data Warehouse
    10. 2.10. Data Homogeneity and Heterogeneity
    11. 2.11. Purging Warehouse Data
    12. 2.12. Reporting and the Architected Environment
    13. 2.13. The Operational Window of Opportunity
    14. 2.14. Incorrect Data in the Data Warehouse
    15. 2.15. Summary
  8. 3. The Data Warehouse and Design
    1. 3.1. Beginning with Operational Data
    2. 3.2. Process and Data Models and the Architected Environment
    3. 3.3. The Data Warehouse and Data Models
      1. 3.3.1. The Data Warehouse Data Model
      2. 3.3.2. The Midlevel Data Model
      3. 3.3.3. The Physical Data Model
    4. 3.4. The Data Model and Iterative Development
    5. 3.5. Normalization and Denormalization
      1. 3.5.1. Snapshots in the Data Warehouse
    6. 3.6. Metadata
      1. 3.6.1. Managing Reference Tables in a Data Warehouse
    7. 3.7. Cyclicity of Data — The Wrinkle of Time
    8. 3.8. Complexity of Transformation and Integration
    9. 3.9. Triggering the Data Warehouse Record
      1. 3.9.1. Events
      2. 3.9.2. Components of the Snapshot
      3. 3.9.3. Some Examples
    10. 3.10. Profile Records
    11. 3.11. Managing Volume
    12. 3.12. Creating Multiple Profile Records
    13. 3.13. Going from the Data Warehouse to the Operational Environment
    14. 3.14. Direct Operational Access of Data Warehouse Data
    15. 3.15. Indirect Access of Data Warehouse Data
      1. 3.15.1. An Airline Commission Calculation System
      2. 3.15.2. A Retail Personalization System
      3. 3.15.3. Credit Scoring
    16. 3.16. Indirect Use of Data Warehouse Data
    17. 3.17. Star Joins
    18. 3.18. Supporting the ODS
    19. 3.19. Requirements and the Zachman Framework
    20. 3.20. Summary
  9. 4. Granularity in the Data Warehouse
    1. 4.1. Raw Estimates
    2. 4.2. Input to the Planning Process
    3. 4.3. Data in Overflow
      1. 4.3.1. Overflow Storage
    4. 4.4. What the Levels of Granularity Will Be
    5. 4.5. Some Feedback Loop Techniques
    6. 4.6. Levels of Granularity — Banking Environment
    7. 4.7. Feeding the Data Marts
    8. 4.8. Summary
  10. 5. The Data Warehouse and Technology
    1. 5.1. Managing Large Amounts of Data
    2. 5.2. Managing Multiple Media
    3. 5.3. Indexing and Monitoring Data
    4. 5.4. Interfaces to Many Technologies
    5. 5.5. Programmer or Designer Control of Data Placement
    6. 5.6. Parallel Storage and Management of Data
      1. 5.6.1. Metadata Management
    7. 5.7. Language Interface
    8. 5.8. Efficient Loading of Data
    9. 5.9. Efficient Index Utilization
    10. 5.10. Compaction of Data
    11. 5.11. Compound Keys
    12. 5.12. Variable-Length Data
    13. 5.13. Lock Management
    14. 5.14. Index-Only Processing
    15. 5.15. Fast Restore
    16. 5.16. Other Technological Features
    17. 5.17. DBMS Types and the Data Warehouse
    18. 5.18. Changing DBMS Technology
    19. 5.19. Multidimensional DBMS and the Data Warehouse
    20. 5.20. Data Warehousing across Multiple Storage Media
    21. 5.21. The Role of Metadata in the Data Warehouse Environment
    22. 5.22. Context and Content
      1. 5.22.1. Three Types of Contextual Information
      2. 5.22.2. Capturing and Managing Contextual Information
      3. 5.22.3. Looking at the Past
    23. 5.23. Refreshing the Data Warehouse
    24. 5.24. Testing
    25. 5.25. Summary
  11. 6. The Distributed Data Warehouse
    1. 6.1. Types of Distributed Data Warehouses
      1. 6.1.1. Local and Global Data Warehouses
        1. 6.1.1.1. The Local Data Warehouse
        2. 6.1.1.2. The Global Data Warehouse
        3. 6.1.1.3. Intersection of Global and Local Data
        4. 6.1.1.4. Redundancy
        5. 6.1.1.5. Access of Local and Global Data
      2. 6.1.2. The Technologically Distributed Data Warehouse
      3. 6.1.3. The Independently Evolving Distributed Data Warehouse
    2. 6.2. The Nature of the Development Efforts
      1. 6.2.1. Completely Unrelated Warehouses
    3. 6.3. Distributed Data Warehouse Development
      1. 6.3.1. Coordinating Development across Distributed Locations
      2. 6.3.2. The Corporate Data Model — Distributed
      3. 6.3.3. Metadata in the Distributed Warehouse
    4. 6.4. Building the Warehouse on Multiple Levels
    5. 6.5. Multiple Groups Building the Current Level of Detail
      1. 6.5.1. Different Requirements at Different Levels
      2. 6.5.2. Other Types of Detailed Data
      3. 6.5.3. Metadata
    6. 6.6. Multiple Platforms for Common Detail Data
    7. 6.7. Summary
  12. 7. Executive Information Systems and the Data Warehouse
    1. 7.1. EIS — The Promise
    2. 7.2. A Simple Example
    3. 7.3. Drill-Down Analysis
    4. 7.4. Supporting the Drill-Down Process
    5. 7.5. The Data Warehouse as a Basis for EIS
    6. 7.6. Where to Turn
    7. 7.7. Event Mapping
    8. 7.8. Detailed Data and EIS
    9. 7.9. Keeping Only Summary Data in the EIS
    10. 7.10. Summary
  13. 8. External Data and the Data Warehouse
    1. 8.1. External Data in the Data Warehouse
    2. 8.2. Metadata and External Data
    3. 8.3. Storing External Data
    4. 8.4. Different Components of External Data
    5. 8.5. Modeling and External Data
    6. 8.6. Secondary Reports
    7. 8.7. Archiving External Data
    8. 8.8. Comparing Internal Data to External Data
    9. 8.9. Summary
  14. 9. Migration to the Architected Environment
    1. 9.1. A Migration Plan
    2. 9.2. The Feedback Loop
    3. 9.3. Strategic Considerations
    4. 9.4. Methodology and Migration
    5. 9.5. A Data-Driven Development Methodology
      1. 9.5.1. Data-Driven Methodology
      2. 9.5.2. System Development Life Cycles
      3. 9.5.3. A Philosophical Observation
    6. 9.6. Summary
  15. 10. The Data Warehouse and the Web
    1. 10.1. Supporting the eBusiness Environment
    2. 10.2. Moving Data from the Web to the Data Warehouse
    3. 10.3. Moving Data from the Data Warehouse to the Web
    4. 10.4. Web Support
    5. 10.5. Summary
  16. 11. Unstructured Data and the Data Warehouse
    1. 11.1. Integrating the Two Worlds
      1. 11.1.1. Text — The Common Link
      2. 11.1.2. A Fundamental Mismatch
      3. 11.1.3. Matching Text across the Environments
      4. 11.1.4. A Probabilistic Match
      5. 11.1.5. Matching All the Information
    2. 11.2. A Themed Match
      1. 11.2.1. Industrially Recognized Themes
      2. 11.2.2. Naturally Occurring Themes
      3. 11.2.3. Linkage through Themes and Themed Words
      4. 11.2.4. Linkage through Abstraction and Metadata
    3. 11.3. A Two-Tiered Data Warehouse
      1. 11.3.1. Dividing the Unstructured Data Warehouse
      2. 11.3.2. Documents in the Unstructured Data Warehouse
      3. 11.3.3. Visualizing Unstructured Data
    4. 11.4. A Self-Organizing Map (SOM)
      1. 11.4.1. The Unstructured Data Warehouse
      2. 11.4.2. Volumes of Data and the Unstructured Data Warehouse
    5. 11.5. Fitting the Two Environments Together
    6. 11.6. Summary
  17. 12. The Really Large Data Warehouse
    1. 12.1. Why the Rapid Growth?
    2. 12.2. The Impact of Large Volumes of Data
      1. 12.2.1. Basic Data-Management Activities
      2. 12.2.2. The Cost of Storage
      3. 12.2.3. The Real Costs of Storage
      4. 12.2.4. The Usage Pattern of Data in the Face of Large Volumes
      5. 12.2.5. A Simple Calculation
      6. 12.2.6. Two Classes of Data
      7. 12.2.7. Implications of Separating Data into Two Classes
    3. 12.3. Disk Storage in the Face of Data Separation
      1. 12.3.1. Near-Line Storage
      2. 12.3.2. Access Speed and Disk Storage
      3. 12.3.3. Archival Storage
      4. 12.3.4. Implications of Transparency
    4. 12.4. Moving Data from One Environment to Another
      1. 12.4.1. The CMSM Approach
      2. 12.4.2. A Data Warehouse Usage Monitor
      3. 12.4.3. The Extension of the Data Warehouse across Different Storage Media
    5. 12.5. Inverting the Data Warehouse
    6. 12.6. Total Cost
    7. 12.7. Maximum Capacity
    8. 12.8. Summary
  18. 13. The Relational and the Multidimensional Model as a Basis for Database Design
    1. 13.1. The Relational Model
    2. 13.2. The Multidimensional Model
    3. 13.3. Snowflake Structures
    4. 13.4. Differences between the Models
      1. 13.4.1. The Roots of the Differences
      2. 13.4.2. Reshaping Relational Data
      3. 13.4.3. Indirect Access and Direct Access of Data
      4. 13.4.4. Servicing Future Unknown Needs
      5. 13.4.5. Servicing the Need to Change Gracefully
    5. 13.5. Independent Data Marts
    6. 13.6. Building Independent Data Marts
    7. 13.7. Summary
  19. 14. Data Warehouse Advanced Topics
    1. 14.1. End-User Requirements and the Data Warehouse
      1. 14.1.1. The Data Warehouse and the Data Model
      2. 14.1.2. The Relational Foundation
      3. 14.1.3. The Data Warehouse and Statistical Processing
    2. 14.2. Resource Contention in the Data Warehouse
      1. 14.2.1. The Exploration Warehouse
      2. 14.2.2. The Data Mining Warehouse
      3. 14.2.3. Freezing the Exploration Warehouse
      4. 14.2.4. External Data and the Exploration Warehouse
    3. 14.3. Data Marts and Data Warehouses in the Same Processor
    4. 14.4. The Life Cycle of Data
      1. 14.4.1. Mapping the Life Cycle to the Data Warehouse Environment
    5. 14.5. Testing and the Data Warehouse
    6. 14.6. Tracing the Flow of Data through the Data Warehouse
      1. 14.6.1. Data Velocity in the Data Warehouse
      2. 14.6.2. "Pushing" and "Pulling" Data
    7. 14.7. Data Warehouse and the Web-Based eBusiness Environment
      1. 14.7.1. The Interface between the Two Environments
      2. 14.7.2. The Granularity Manager
      3. 14.7.3. Profile Records
      4. 14.7.4. The ODS, Profile Records, and Performance
    8. 14.8. The Financial Data Warehouse
    9. 14.9. The System of Record
    10. 14.10. A Brief History of Architecture — Evolving to the Corporate Information Factory
      1. 14.10.1. Evolving from the CIF
      2. 14.10.2. Obstacles
    11. 14.11. CIF — Into the Future
      1. 14.11.1. Analytics
      2. 14.11.2. ERP/SAP
      3. 14.11.3. Unstructured Data
      4. 14.11.4. Volumes of Data
    12. 14.12. Summary
  20. 15. Cost-Justification and Return on Investment for a Data Warehouse
    1. 15.1. Copying the Competition
    2. 15.2. The Macro Level of Cost-Justification
    3. 15.3. A Micro Level Cost-Justification
    4. 15.4. Information from the Legacy Environment
      1. 15.4.1. The Cost of New Information
      2. 15.4.2. Gathering Information with a Data Warehouse
      3. 15.4.3. Comparing the Costs
      4. 15.4.4. Building the Data Warehouse
      5. 15.4.5. A Complete Picture
      6. 15.4.6. Information Frustration
    5. 15.5. The Time Value of Data
      1. 15.5.1. The Speed of Information
    6. 15.6. Integrated Information
      1. 15.6.1. The Value of Historical Data
      2. 15.6.2. Historical Data and CRM
    7. 15.7. Summary
  21. 16. The Data Warehouse and the ODS
    1. 16.1. Complementary Structures
      1. 16.1.1. Updates in the ODS
      2. 16.1.2. Historical Data and the ODS
      3. 16.1.3. Profile Records
    2. 16.2. Different Classes of ODS
    3. 16.3. Database Design — A Hybrid Approach
    4. 16.4. Drawn to Proportion
    5. 16.5. Transaction Integrity in the ODS
    6. 16.6. Time Slicing the ODS Day
    7. 16.7. Multiple ODS
    8. 16.8. ODS and the Web Environment
    9. 16.9. An Example of an ODS
    10. 16.10. Summary
  22. 17. Corporate Information Compliance and Data Warehousing
    1. 17.1. Two Basic Activities
    2. 17.2. Financial Compliance
      1. 17.2.1. The "What"
      2. 17.2.2. The "Why"
    3. 17.3. Auditing Corporate Communications
    4. 17.4. Summary
  23. 18. The End-User Community
    1. 18.1. The Farmer
    2. 18.2. The Explorer
    3. 18.3. The Miner
    4. 18.4. The Tourist
    5. 18.5. The Community
    6. 18.6. Different Types of Data
    7. 18.7. Cost-Justification and ROI Analysis
    8. 18.8. Summary
  24. 19. Data Warehouse Design Review Checklist
    1. 19.1. When to Do a Design Review
    2. 19.2. Who Should Be in the Design Review?
    3. 19.3. What Should the Agenda Be?
    4. 19.4. The Results
    5. 19.5. Administering the Review
    6. 19.6. A Typical Data Warehouse Design Review
    7. 19.7. Summary
  25. Glossary
  26. References
    1. Articles
    2. Books
    3. White Papers