You are previewing The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling.
O'Reilly logo
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

Book Description

Single most authoritative guide from the inventor of the technique.

  • Presents unique modeling techniques for e-commerce, and shows strategies for optimizing performance.

  • Companion Web site provides updates on dimensional modeling techniques, links related to sites, and source code where appropriate.

Table of Contents

  1. Copyright
  2. ACKNOWLEDGMENTS
  3. INTRODUCTION
    1. Intended Audience
    2. Chapter Preview
      1. Navigation Aids
    3. Purpose of Each Chapter
      1. Chapter 1: Dimensional Modeling Primer
      2. Chapter 2: Retail Sales
      3. Chapter 3: Inventory
      4. Chapter 4: Procurement
      5. Chapter 5: Order Management
      6. Chapter 6: Customer Relationship Management
      7. Chapter 7: Accounting
      8. Chapter 8: Human Resources Management
      9. Chapter 9: Financial Services
      10. Chapter 10: Telecommunications and Utilities
      11. Chapter 11: Transportation
      12. Chapter 12: Education
      13. Chapter 13: Health Care
      14. Chapter 14: Electronic Commerce
      15. Chapter 15: Insurance
      16. Chapter 16: Building the Data Warehouse
      17. Chapter 17: Present Imperatives and Future Outlook
      18. Glossary
    4. Companion Web Site
    5. Summary
  4. 1. Dimensional Modeling Primer
    1. 1.1. Different Information Worlds
    2. 1.2. Goals of a Data Warehouse
      1. 1.2.1. The Publishing Metaphor
    3. 1.3. Components of a Data Warehouse
      1. 1.3.1. Operational Source Systems
      2. 1.3.2. Data Staging Area
      3. 1.3.3. Data Presentation
      4. 1.3.4. Data Access Tools
      5. 1.3.5. Additional Considerations
    4. 1.4. Dimensional Modeling Vocabulary
      1. 1.4.1. Fact Table
      2. 1.4.2. Dimension Tables
      3. 1.4.3. Bringing Together Facts and Dimensions
    5. 1.5. Dimensional Modeling Myths
      1. 1.5.1. Common Pitfalls to Avoid
    6. 1.6. Summary
  5. 2. Retail Sales
    1. 2.1. Four-Step Dimensional Design Process
    2. 2.2. Retail Case Study
      1. 2.2.1. Step 1. Select the Business Process
      2. 2.2.2. Step 2. Declare the Grain
      3. 2.2.3. Step 3. Choose the Dimensions
      4. 2.2.4. Step 4. Identify the Facts
    3. 2.3. Dimension Table Attributes
      1. 2.3.1. Date Dimension
      2. 2.3.2. Product Dimension
      3. 2.3.3. Store Dimension
      4. 2.3.4. Promotion Dimension
      5. 2.3.5. Degenerate Transaction Number Dimension
    4. 2.4. Retail Schema in Action
    5. 2.5. Retail Schema Extensibility
    6. 2.6. Resisting Comfort Zone Urges
      1. 2.6.1. Dimension Normalization (Snowflaking)
      2. 2.6.2. Too Many Dimensions
    7. 2.7. Surrogate Keys
    8. 2.8. Market Basket Analysis
    9. 2.9. Summary
  6. 3. Inventory
    1. 3.1. Introduction to the Value Chain
    2. 3.2. Inventory Models
      1. 3.2.1. Inventory Periodic Snapshot
      2. 3.2.2. Inventory Transactions
      3. 3.2.3. Inventory Accumulating Snapshot
    3. 3.3. Value Chain Integration
    4. 3.4. Data Warehouse Bus Architecture
      1. 3.4.1. Data Warehouse Bus Matrix
      2. 3.4.2. Conformed Dimensions
      3. 3.4.3. Conformed Facts
    5. 3.5. Summary
  7. 4. Procurement
    1. 4.1. Procurement Case Study
    2. 4.2. Procurement Transactions
      1. 4.2.1. Multiple- versus Single-Transaction Fact Tables
      2. 4.2.2. Complementary Procurement Snapshot
    3. 4.3. Slowly Changing Dimensions
      1. 4.3.1. Type 1: Overwrite the Value
      2. 4.3.2. Type 2: Add a Dimension Row
      3. 4.3.3. Type 3: Add a Dimension Column
    4. 4.4. Hybrid Slowly Changing Dimension Techniques
      1. 4.4.1. Predictable Changes with Multiple Version Overlays
      2. 4.4.2. Unpredictable Changes with Single-Version Overlay
    5. 4.5. More Rapidly Changing Dimensions
    6. 4.6. Summary
  8. 5. Order Management
    1. 5.1. Introduction to Order Management
    2. 5.2. Order Transactions
      1. 5.2.1. Fact Normalization
      2. 5.2.2. Dimension Role-Playing
      3. 5.2.3. Product Dimension Revisited
      4. 5.2.4. Customer Ship-To Dimension
      5. 5.2.5. Deal Dimension
      6. 5.2.6. Degenerate Dimension for Order Number
      7. 5.2.7. Junk Dimensions
      8. 5.2.8. Multiple Currencies
      9. 5.2.9. Header and Line Item Facts with Different Granularity
    3. 5.3. Invoice Transactions
      1. 5.3.1. Profit and Loss Facts
      2. 5.3.2. Profitability—The Most Powerful Data Mart
      3. 5.3.3. Profitability Words of Warning
      4. 5.3.4. Customer Satisfaction Facts
    4. 5.4. Accumulating Snapshot for the Order Fulfillment Pipeline
      1. 5.4.1. Lag Calculations
      2. 5.4.2. Multiple Units of Measure
      3. 5.4.3. Beyond the Rear-View Mirror
    5. 5.5. Fact Table Comparison
      1. 5.5.1. Transaction Fact Tables
      2. 5.5.2. Periodic Snapshot Fact Tables
      3. 5.5.3. Accumulating Snapshot Fact Tables
    6. 5.6. Designing Real-Time Partitions
      1. 5.6.1. Requirements for the Real-Time Partition
      2. 5.6.2. Transaction Grain Real-Time Partition
      3. 5.6.3. Periodic Snapshot Real-Time Partition
      4. 5.6.4. Accumulating Snapshot Real-Time Partition
    7. 5.7. Summary
  9. 6. Customer Relationship Management
    1. 6.1. CRM Overview
      1. 6.1.1. Operational and Analytic CRM
      2. 6.1.2. Packaged CRM
    2. 6.2. Customer Dimension
      1. 6.2.1. Name and Address Parsing
      2. 6.2.2. Other Common Customer Attributes
      3. 6.2.3. Dimension Outriggers for a Low-Cardinality Attribute Set
      4. 6.2.4. Large Changing Customer Dimensions
      5. 6.2.5. Implications of Type 2 Customer Dimension Changes
      6. 6.2.6. Customer Behavior Study Groups
      7. 6.2.7. Commercial Customer Hierarchies
      8. 6.2.8. Combining Multiple Sources of Customer Data
    3. 6.3. Analyzing Customer Data from Multiple Business Processes
    4. 6.4. Summary
  10. 7. Accounting
    1. 7.1. Accounting Case Study
    2. 7.2. General Ledger Data
      1. 7.2.1. General Ledger Periodic Snapshot
      2. 7.2.2. General Ledger Journal Transactions
      3. 7.2.3. Financial Statements
    3. 7.3. Budgeting Process
      1. 7.3.1. Consolidated Fact Tables
    4. 7.4. Role of OLAP and Packaged Analytic Solutions
    5. 7.5. Summary
  11. 8. Human Resources Management
    1. 8.1. Time-Stamped Transaction Tracking in a Dimension
    2. 8.2. Time-Stamped Dimension with Periodic Snapshot Facts
    3. 8.3. Audit Dimension
    4. 8.4. Keyword Outrigger Dimension
      1. 8.4.1. AND/OR Dilemma
      2. 8.4.2. Searching for Substrings
    5. 8.5. Survey Questionnaire Data
    6. 8.6. Summary
  12. 9. Financial Services
    1. 9.1. Banking Case Study
    2. 9.2. Dimension Triage
      1. 9.2.1. Household Dimension
      2. 9.2.2. Multivalued Dimensions
      3. 9.2.3. Minidimensions Revisited
    3. 9.3. Arbitrary Value Banding of Facts
    4. 9.4. Point-in-Time Balances
    5. 9.5. Heterogeneous Product Schemas
      1. 9.5.1. Heterogeneous Products with Transaction Facts
    6. 9.6. Summary
  13. 10. Telecommunications and Utilities
    1. 10.1. Telecommunications Case Study
    2. 10.2. General Design Review Considerations
      1. 10.2.1. Granularity
      2. 10.2.2. Date Dimension
      3. 10.2.3. Degenerate Dimensions
      4. 10.2.4. Dimension Decodes and Descriptions
      5. 10.2.5. Surrogate Keys
      6. 10.2.6. Too Many (or Too Few) Dimensions
    3. 10.3. Draft Design Exercise Discussion
    4. 10.4. Geographic Location Dimension
      1. 10.4.1. Location Outrigger
      2. 10.4.2. Leveraging Geographic Information Systems
    5. 10.5. Summary
  14. 11. Transportation
    1. 11.1. Airline Frequent Flyer Case Study
      1. 11.1.1. Multiple Fact Table Granularities
      2. 11.1.2. Linking Segments into Trips
    2. 11.2. Extensions to Other Industries
      1. 11.2.1. Cargo Shipper
      2. 11.2.2. Travel Services
    3. 11.3. Combining Small Dimensions into a Superdimension
      1. 11.3.1. Class of Service
      2. 11.3.2. Origin and Destination
    4. 11.4. More Date and Time Considerations
      1. 11.4.1. Country-Specific Calendars
      2. 11.4.2. Time of Day as a Dimension or Fact
      3. 11.4.3. Date and Time in Multiple Time Zones
    5. 11.5. Summary
  15. 12. Education
    1. 12.1. University Case Study
    2. 12.2. Accumulating Snapshot for Admissions Tracking
    3. 12.3. Factless Fact Tables
      1. 12.3.1. Student Registration Events
      2. 12.3.2. Facilities Utilization Coverage
      3. 12.3.3. Student Attendance Events
    4. 12.4. Other Areas of Analytic Interest
    5. 12.5. Summary
  16. 13. Health Care
    1. 13.1. Health Care Value Circle
    2. 13.2. Health Care Bill
      1. 13.2.1. Roles Played By the Date Dimension
      2. 13.2.2. Multivalued Diagnosis Dimension
      3. 13.2.3. Extending a Billing Fact Table to Show Profitability
      4. 13.2.4. Dimensions for Billed Hospital Stays
    3. 13.3. Complex Health Care Events
    4. 13.4. Medical Records
      1. 13.4.1. Fact Dimension for Sparse Facts
    5. 13.5. Going Back in Time
      1. 13.5.1. Late-Arriving Fact Rows
      2. 13.5.2. Late-Arriving Dimension Rows
    6. 13.6. Summary
  17. 14. Electronic Commerce
    1. 14.1. Web Client-Server Interactions Tutorial
    2. 14.2. Why the Clickstream Is Not Just Another Data Source
      1. 14.2.1. Challenges of Tracking with Clickstream Data
      2. 14.2.2. Specific Dimensions for the Clickstream
    3. 14.3. Clickstream Fact Table for Complete Sessions
    4. 14.4. Clickstream Fact Table for Individual Page Events
    5. 14.5. Aggregate Clickstream Fact Tables
    6. 14.6. Integrating the Clickstream Data Mart into the Enterprise Data Warehouse
    7. 14.7. Electronic Commerce Profitability Data Mart
    8. 14.8. Summary
  18. 15. Insurance
    1. 15.1. Insurance Case Study
      1. 15.1.1. Insurance Value Chain
      2. 15.1.2. Draft Insurance Bus Matrix
    2. 15.2. Policy Transactions
      1. 15.2.1. Dimension Details and Techniques
      2. 15.2.2. Alternative (or Complementary) Policy Accumulating Snapshot
    3. 15.3. Policy Periodic Snapshot
      1. 15.3.1. Conformed Dimensions
      2. 15.3.2. Conformed Facts
      3. 15.3.3. Heterogeneous Products Again
      4. 15.3.4. Multivalued Dimensions Again
    4. 15.4. More Insurance Case Study Background
      1. 15.4.1. Updated Insurance Bus Matrix
    5. 15.5. Claims Transactions
    6. 15.6. Claims Accumulating Snapshot
    7. 15.7. Policy/Claims Consolidated Snapshot
    8. 15.8. Factless Accident Events
    9. 15.9. Common Dimensional Modeling Mistakes to Avoid
    10. 15.10. Summary
  19. 16. Building the Data Warehouse
    1. 16.1. Business Dimensional Lifecycle Road Map
      1. 16.1.1. Road Map Major Points of Interest
    2. 16.2. Project Planning and Management
      1. 16.2.1. Assessing Readiness
      2. 16.2.2. Scoping
      3. 16.2.3. Justification
      4. 16.2.4. Staffing
      5. 16.2.5. Developing and Maintaining the Project Plan
    3. 16.3. Business Requirements Definition
      1. 16.3.1. Requirements Preplanning
      2. 16.3.2. Collecting the Business Requirements
      3. 16.3.3. Postcollection Documentation and Follow-up
    4. 16.4. Lifecycle Technology Track
    5. 16.5. Technical Architecture Design
      1. 16.5.1. Eight-Step Process for Creating the Technical Architecture
    6. 16.6. Product Selection and Installation
    7. 16.7. Lifecycle Data Track
    8. 16.8. Dimensional Modeling
    9. 16.9. Physical Design
      1. 16.9.1. Aggregation Strategy
      2. 16.9.2. Initial Indexing Strategy
    10. 16.10. Data Staging Design and Development
      1. 16.10.1. Dimension Table Staging
      2. 16.10.2. Fact Table Staging
    11. 16.11. Lifecycle Analytic Applications Track
      1. 16.11.1. Analytic Application Specification
      2. 16.11.2. Analytic Application Development
    12. 16.12. Deployment
    13. 16.13. Maintenance and Growth
    14. 16.14. Common Data Warehousing Mistakes to Avoid
    15. 16.15. Summary
  20. 17. Present Imperatives and Future Outlook
    1. 17.1. Ongoing Technology Advances
    2. 17.2. Political Forces Demanding Security and Affecting Privacy
      1. 17.2.1. Conflict between Beneficial Uses and Insidious Abuses
      2. 17.2.2. Who Owns Your Personal Data?
      3. 17.2.3. What Is Likely to Happen? Watching the Watchers . . .
      4. 17.2.4. How Watching the Watchers Affects Data Warehouse Architecture
    3. 17.3. Designing to Avoid Catastrophic Failure
      1. 17.3.1. Catastrophic Failures
      2. 17.3.2. Countering Catastrophic Failures
    4. 17.4. Intellectual Property and Fair Use
    5. 17.5. Cultural Trends in Data Warehousing
      1. 17.5.1. Managing by the Numbers across the Enterprise
      2. 17.5.2. Increased Reliance on Sophisticated Key Performance Indicators
      3. 17.5.3. Behavior Is the New Marquee Application
      4. 17.5.4. Packaged Applications Have Hit Their High Point
      5. 17.5.5. Application Integration Has to Be Done by Someone
      6. 17.5.6. Data Warehouse Outsourcing Needs a Sober Risk Assessment
    6. 17.6. In Closing
  21. Glossary