You are previewing Star Schema The Complete Reference™.
O'Reilly logo
Star Schema The Complete Reference™

Book Description

The definitive guide to dimensional design for your data warehouse

Learn the best practices of dimensional design. Star Schema: The Complete Reference offers in-depth coverage of design principles and their underlying rationales. Organized around design concepts and illustrated with detailed examples, this is a step-by-step guidebook for beginners and a comprehensive resource for experts.

This all-inclusive volume begins with dimensional design fundamentals and shows how dimensional design fits into diverse data warehouse architectures, including those of W.H. Inmon and Ralph Kimball. The book progresses through a series of advanced techniques that help you address real-world complexity, maximize performance, and adapt to the requirements of BI and ETL software products. You are furnished with design tasks and deliverables that can be incorporated into any project, regardless of architecture or methodology.

• Master the fundamentals of star schema design and slow change processing

• Identify situations that call for multiple stars or cubes

• Ensure compatibility across subject areas as your data warehouse grows

• Accommodate repeating attributes, recursive hierarchies, and poor data quality

• Support conflicting requirements for historic data

• Handle variation within a business process and correlation of disparate activities

• Boost performance using derived schemas and aggregates

• Learn when it's appropriate to adjust designs for BI and ETL tools

Table of Contents

  1. Cover Page
  2. Star Schema The Complete Reference™
  3. Copyright Page
  4. Dedication
  5. Contents
  6. Acknowledgments
  7. Introduction
  8. Part I Fundamentals
    1. Chapter 1 Analytic Databases and Dimensional Design
      1. Dimensional Design
        1. Purpose
        2. Measurement and Context
        3. Facts and Dimensions
        4. Grouping Dimensions and Facts
      2. The Star Schema
        1. Dimension Tables
        2. Keys and History
        3. Fact Tables
      3. Using a Star Schema
        1. Querying Facts
        2. Browsing Dimensions
      4. Guiding Principles
      5. Summary
      6. Further Reading
    2. Chapter 2 Data Warehouse Architectures
      1. Inmon’s Corporate Information Factory
      2. Kimball’s Dimensional Data Warehouse
      3. Stand-Alone Data Marts
      4. Architecture and Dimensional Design
        1. Contrasting the Approaches
        2. The Common Element
        3. Terms Used in This Book
      5. Summary
      6. Further Reading
    3. Chapter 3 Stars and Cubes
      1. Dimension Table Features
        1. Surrogate Keys and Natural Keys
        2. Rich Set of Dimensions
        3. Grouping Dimensions into Dimension Tables
      2. Fact Table Features
        1. Fact Tables and Processes
        2. Capturing Facts
        3. Grain
        4. Sparsity
        5. Degenerate Dimensions
      3. Slowly Changing Dimensions
        1. Type 1 Change
        2. Type 2 Change
        3. Choosing and Implementing Response Types
      4. Cubes
      5. Summary
      6. Further Reading
  9. Part II Multiple Stars
    1. Chapter 4 A Fact Table for Each Process
      1. Fact Tables and Business Processes
      2. Facts that Have Different Timing
        1. A Single Fact Table Causes Difficulties
        2. Modeling in Separate Fact Tables
      3. Facts that Have Different Grain
        1. A Single Fact Table Causes Difficulties
        2. Modeling in Separate Fact Tables
      4. Analyzing Facts from More than One Fact Table
        1. The Peril of Joining Fact Tables
        2. Drilling Across
        3. Drill-Across Implementations
      5. Summary
      6. Further Reading
    2. Chapter 5 Conformed Dimensions
      1. The Synergy of Multiple Stars
      2. Dimensions and Drilling Across
        1. What Causes Failure?
        2. Identical Tables Not Required
      3. Conformed Dimensions
        1. Types of Dimensional Conformance
        2. Planning Conformance
      4. Architecture and Conformance
        1. Dimensional Data Warehouse
        2. Corporate Information Factory
        3. Stand-Alone Data Marts
      5. Summary
      6. Further Reading
  10. Part III Dimension Design
    1. Chapter 6 More on Dimension Tables
      1. Grouping Dimensions into Tables
        1. Two Ways of Relating Dimension Attributes
        2. When Struggling with Dimension Groupings
      2. Breaking Up Large Dimensions
        1. Splitting Dimension Tables Arbitrarily
        2. Alternatives to Split Dimensions
        3. Mini-Dimensions Alleviate ETL Bottlenecks and Excessive Growth
      3. Dimension Roles and Aliasing
      4. Avoiding the NULL
        1. Problems Caused by NULL
        2. Avoiding NULL Foreign Key Values
        3. Uses for Special-Case Rows
      5. Behavioral Dimensions
        1. Converting Facts to Dimensions at Query Time
        2. Designing and Using Behavioral Dimensions
        3. Design Considerations for Behavioral Dimensions
      6. Summary
      7. Further Reading
    2. Chapter 7 Hierarchies and Snowflakes
      1. Drilling
        1. The Concept of Drilling
        2. The Reality of Drilling
      2. Attribute Hierarchies and Drilling
        1. The Attribute Hierarchy
        2. Drilling Within an Attribute Hierarchy
        3. Other Ways to Drill
        4. Documenting Attribute Hierarchies
      3. Snowflakes
        1. Avoiding the Snowflake
        2. Embracing the Snowflake
      4. Outriggers
        1. Repeating Groups
        2. Eliminating Repeating Groups with Outriggers
        3. Outriggers and Slow Change Processing
      5. Summary
      6. Further Reading
    3. Chapter 8 More Slow Change Techniques
      1. Time-Stamped Dimensions
        1. Point-in-Time Status of a Dimension
        2. The Time-Stamped Solution
      2. Type 3 Changes
        1. Study All Facts with Old or New Dimension Values
        2. The Type 3 Solution
      3. Hybrid Slow Changes
        1. Conflicting Requirements
        2. The Hybrid Response
        3. Evaluating and Extending the Hybrid Approach
      4. Summary
      5. Further Reading
    4. Chapter 9 Multi-Valued Dimensions and Bridges
      1. Standard One-to-Many Relationships
      2. Multi-Valued Dimensions
        1. Simplifying the Relationship
        2. Using a Bridge for Multi-Valued Dimensions
      3. Multi-Valued Attributes
        1. Simplifying the Multi-Valued Attribute
        2. Using an Attribute Bridge
      4. Summary
      5. Further Reading
    5. Chapter 10 Recursive Hierarchies and Bridges
      1. Recursive Hierarchies
        1. Rows Referring to Other Rows
        2. The Reporting Challenge
      2. Flattening a Recursive Hierarchy
        1. A Flattened Hierarchy
        2. Drawbacks of Flattening
        3. When Flattening Works Best
      3. The Hierarchy Bridge
        1. Hierarchy Bridge Design
        2. Using the Bridge
        3. Double-Counting
        4. Resolving the Many-to-Many Relationship
        5. Potential Misuse
      4. Changes and the Hierarchy Bridge
        1. Type 1 Changes in the Dimension or Bridge
        2. Type 2 Changes to the Dimension
        3. Type 2 Changes to the Hierarchy
      5. Variations on the Hierarchy Bridge
        1. Embellishing the Bridge
        2. Multiple Parents
        3. Multiple Hierarchies
      6. Summary
      7. Further Reading
  11. Part IV Fact Table Design
    1. Chapter 11 Transactions, Snapshots, and Accumulating Snapshots
      1. Transaction Fact Tables
        1. Describing Events
        2. Properties of Transaction Fact Tables
      2. Snapshot Fact Tables
        1. The Challenge: Studying Status
        2. The Snapshot Model
        3. Snapshot Considerations
      3. Accumulating Snapshot Fact Tables
        1. Challenge: Studying Elapsed Time Between Events
        2. The Accumulating Snapshot
        3. Accumulating Snapshot Considerations
      4. Summary
      5. Further Reading
    2. Chapter 12 Factless Fact Tables
      1. Events with No Facts
        1. Nothing to Measure?
        2. The Factless Fact Table
        3. Using a Factless Fact Table
        4. Adding a Fact
      2. Conditions, Coverage, or Eligibility
        1. Why Model Conditions?
        2. Factless Fact Tables for Conditions
        3. Comparing Activities and Conditions
        4. Slowly Changing Dimensions and Conditions
      3. Summary
      4. Further Reading
    3. Chapter 13 Type-Specific Stars
      1. Type-Specific Attributes
        1. Operational Systems
        2. Analytic Systems
      2. Core and Custom Stars
        1. Core and Custom Dimension Tables
        2. Core and Custom Fact Tables
        3. Other Considerations
      3. Using Generic Attributes
        1. Generic Attributes
        2. Using a Generic Design
      4. Summary
      5. Further Reading
  12. Part V Performance
    1. Chapter 14 Derived Schemas
      1. Restructuring Dimensional Data
        1. Uses for Derived Schemas
        2. Derived Schemas Already Covered
        3. The Cost of Derived Schemas
      2. The Merged Fact Table
        1. Precomputed Drill-Across Results
        2. Simplified Process Comparison
        3. Improved Performance
        4. Supporting Tools that Cannot Drill Across
        5. Single-Process Analysis
        6. Including a Nonshared Dimension
      3. The Pivoted Fact Table
        1. The Need to Pivot Data
        2. The Pivoted Advantage
        3. Drawbacks to Pivoting
      4. The Sliced Fact Table
        1. Creating Slices of a Star
        2. Uses for Sliced Fact Tables
        3. Slices First
      5. Set Operation Fact Tables
        1. Comparing Two Sets of Data
        2. Several Possible Comparisons
        3. Choosing to Precompute Set Operations
      6. Summary
      7. Further Reading
    2. Chapter 15 Aggregates
      1. Fundamentals of Aggregates
        1. Summarizing Base Data
        2. Using Aggregates
        3. Loading Aggregates
        4. Cubes as Aggregates
      2. Making Aggregates Invisible
        1. Aggregate Navigation
        2. Aggregate Generation
      3. Alternative Summary Designs
        1. Transformative Summaries May Also Be Useful
        2. Single Table Designs Should Be Avoided
      4. Summary
      5. Further Reading
  13. Part VI Tools and Documentation
    1. Chapter 16 Design and Business Intelligence
      1. Business Intelligence and SQL Generation
        1. SQL Generators
        2. The Limitations of SQL Generators
      2. Guidelines for the Semantic Layer
        1. Features to Avoid
        2. Features to Use
      3. Working with SQL-Generating BI Tools
        1. Multiple Stars
        2. Semi-Additivity
        3. Browse Queries
        4. Bridge Tables
      4. Working with Cube-Based BI
        1. Cube-Centric Business Intelligence
        2. Auto-Generation of Cubes
      5. Summary
      6. Further Reading
    2. Chapter 17 Design and ETL
      1. The ETL Process
        1. A Complex Task
        2. Tools Used by the ETL Process
        3. Architecture and the ETL Process
      2. Loading a Star
        1. A Top-Level Dependency
        2. Loading a Dimension Table
        3. Loading the Fact Table
      3. Optimizing the Load
        1. Changed Data Identification
        2. Simplifying Processing
      4. Cleansing Data
        1. What Should Be Cleaned Up
        2. Cleaning Up Dimensional Data
        3. Facts with Invalid Details
      5. Housekeeping Columns
        1. Housekeeping Columns in Dimension Tables
        2. Housekeeping and Fact Tables
      6. Summary
      7. Further Reading
    3. Chapter 18 How to Design and Document a Dimensional Model
      1. Dimensional Design and the Data Warehouse Life Cycle
        1. The Strategic Importance of Dimensional Design
        2. When to Do Dimensional Design
      2. Design Activities
        1. Planning the Design Effort
        2. Conducting Interviews
        3. Designing the Dimensional Model
        4. Prioritizing Plans
        5. Documenting the Results
      3. Documenting a Dimensional Model
        1. Requirements Documentation
        2. Top-Level Design Documentation
        3. Detailed Design Documentation
        4. Logical vs. Physical
      4. Summary
      5. Further Reading
  14. Index