You are previewing Data Warehousing: Architecture and Implementation.
O'Reilly logo
Data Warehousing: Architecture and Implementation

Book Description


A start-to-finish process for deploying successful data warehouses.

This book delivers what every data warehousing project participant needs most: a thorough overview of today's best solutions, and a reliable step-by-step process for building warehouses that meet their objectives. It answers the key questions asked by everyone involved in a data warehouse initiative: project sponsors, developers, managers, and CIOs. And, with over 75 figures, it doesn't just tell you how to get the job done: it shows you.

  • Migration strategies and scenarios

  • Management and support: issue resolution, capacity planning, security, backup, and more

  • Specific answers for project sponsors, managers, and CIOs

  • 12 steps for implementation

  • Best techniques for schema design and metadata

  • Choosing the right hardware, software, and platforms

  • Evolving as new technologies mature

Rely on this book's up-to-date listings of vendors and Web resources. Learn to evolve your data warehouse as new technologies mature-including metadata interchange standards, Web solutions, and Windows …Ø NT. Whatever your goals, Data Warehousing for IT Professionals will help you achieve them faster and at lower cost.

Table of Contents

  1. Copyright
  2. Preface
  3. 1. Introduction
    1. 1. The Enterprise IT Architecture
      1. The Past: Evolution of Enterprise Architectures
      2. The Present: The IT Professional's Responsibility
        1. Meet Business Requirements
        2. Take Advantage of Technology Advancements
      3. Business Perspective
        1. Operational
        2. Decisional
        3. Informational
        4. Virtual Corporation
      4. Technology Perspective
        1. Operational Needs
          1. Legacy Systems
          2. OLTP Applications
          3. Active Databases
          4. Operational Data Stores
          5. Flash Monitoring and Reporting
          6. Workflow Management and Groupware
        2. Decisional Needs
          1. Data Warehouse
          2. Decision Support Applications
        3. Informational Needs
          1. Informational Web Services and Scripts
        4. Virtual Corporation Needs
          1. Transactional Web Services and Scripts
        5. Other Components
      5. Architecture Migration Scenarios
        1. Legacy Integration
          1. The Need
          2. The Recommended Approach
        2. Operational Monitoring
          1. The Need
          2. The Recommended Approach
        3. Process Implementation
          1. The Need
          2. The Recommended Approach
        4. Decision Support
          1. The Need
          2. The Recommended Approach
        5. Hyperdata Distribution
          1. The Need
          2. The Recommended Approach
        6. Virtual Corporation
          1. The Need
          2. The Recommended Approach
      6. Migration Strategy: How Do We Move Forward?
        1. Review the Current Enterprise Architecture
        2. Identify Information Architecture Requirements
        3. Develop a Migration Plan Based on Requirements
          1. The Natural Migration Path
          2. The Customized Migration Path
        4. Monitor and Update the Migration Plan
      7. In Summary
    2. 2. Data Warehouse Concepts
      1. Gradual Changes in Computing Focus
        1. Early Computing Focused on Operational Requirements
        2. Decisional Requirements Cannot Be Fully Anticipated
        3. Operational Systems Fail to Provide Decisional Information
        4. Decisional Systems Have Evolved to Meet Decisional Requirements
      2. The Data Warehouse Defined
        1. Integrated
        2. Subject Oriented
        3. Databases
        4. Required for Decision-Making
        5. Each Unit of Data Is Relevant to a Point in Time
        6. A Data Warehouse Contains Both Atomic and Summarized Data
      3. The Dynamic, Ad Hoc Report
      4. The Purposes of a Data Warehouse
        1. To Provide Business Users with Access to Data
        2. To Provide One Version of the Truth
        3. To Record the Past Accurately
        4. To Slice and Dice Through Data
        5. To Separate Analytical and Operational Processing
        6. To Support the Reengineering of Decisional Processes
      5. A Word About Data Marts
        1. Misconception: Data Warehouses and Data Marts Cannot Coexist
        2. Misconception: Data Marts Can Be Built Independently of One Another
      6. A Word About Operational Data Stores
        1. Definition of Operational Data Stores
        2. Flash Monitoring and Reporting Tools
        3. Relationship of Operational Data Stores to Data Warehouse
      7. Data Warehouse Cost-Benefit Analysis / Return on Investment
        1. Benefits
        2. Costs
        3. ROI Considerations
      8. In Summary
  4. 2. People
    1. 3. The Project Sponsor
      1. How Will a Data Warehouse Affect our Decision-Making Processes?
      2. How Does a Data Warehouse Improve My Financial Processes? Marketing? Operations?
        1. Financial Processes
        2. Marketing
        3. Operations
      3. When Is a Data Warehouse Project Justified?
        1. Lack of Information Sharing
        2. Different Groups Produce Conflicting Reports
        3. Tedious Report Creation Process
        4. Reports Are Not Dynamic, and Do Not Support an Ad Hoc Usage Style
        5. Reports That Require Historical Data Are Difficult to Produce
      4. What Expenses Are Involved?
        1. Hardware
        2. Software
        3. Services
        4. Internal Staff
        5. Summary of Typical Costs
      5. What Are the Risks?
        1. Organizational
          1. Wrong Project Sponsor
          2. End-User Community Not Involved
          3. Senior Management Expectations Not Managed
          4. End-User Community Expectations Not Managed
          5. Political Issues
          6. Logistical Overhead
        2. Technological
          1. Technical Architecture and Networking
        3. Project Management
          1. Defining Project Scope Inappropriately
          2. Underestimating Project Time Frame
          3. Underestimating Project Overhead
          4. Losing Focus
          5. Not Looking Beyond the First Data Warehouse Rollout
        4. Data Warehouse Design
      6. Risk-Mitigating Approaches
      7. Is My Organization Ready for a Data Warehouse?
        1. Decision-Makers Feel the Need for Change
        2. Users Clamor for Integrated Decisional Data
        3. The Operational Systems Are Fairly Stable
        4. Staff Can Be Assigned to the Project
        5. There Is Adequate Funding
      8. How Do I Measure the Results?
      9. In Summary
    2. 4. The CIO
      1. How Do I Support the Data Warehouse?
        1. Regular Warehouse Load
        2. Applications
        3. Warehouse DB Optimization
        4. User Assistance or Help Desk
        5. Training
        6. Preparation for Subsequent Rollouts
      2. How Will My Data Warehouse Evolve?
      3. Who Should Be Involved in a Data Warehouse Project?
        1. Steering Committee
        2. User Reference Group
        3. Warehouse Driver
        4. Warehouse Project Manager
        5. Business Analyst(s)
        6. Warehouse Data Architect
        7. Metadata Administrator
        8. Warehouse DBA
        9. Source System Database Administrators (DBAs) and System Administrators (SAs)
        10. Conversion and Extraction Programmer(s)
        11. Technical and Network Architect
        12. Trainer
      4. What Is the Team Structure Like?
      5. What New Skills Will My People Need?
        1. IT Professionals
        2. End Users
      6. How Does Data Warehousing Fit into My IT Architecture?
      7. How Many Vendors Do I Need to Talk to?
        1. Vendor Categories
        2. Enterprise Options
      8. What Should I Look for in a Data Warehouse Vendor?
        1. Solution Framework
        2. Project and Integration Consultancy Services
        3. Front-End/OLAP/Decision Support/Data Access and Retrieval Tools
        4. Middleware/Data Extraction and Transformation Tools
        5. Relational Database Management Systems
        6. Hardware or Operating System Platforms
      9. How Does Data Warehousing Affect My Existing Systems?
        1. Improvement Areas in Operational Systems
      10. Data Warehousing and Its Impact on Other Enterprise Initiatives
        1. How Does Data Warehousing Tie In with BPR?
        2. How Does Data Warehousing Tie In with Intranets?
      11. When Is a Data Warehouse Not Appropriate?
        1. When the Operational Systems Are Not Ready
        2. When the Need Is Operational Integration
      12. How Do I Manage or Control a Data Warehouse Initiative?
      13. In Summary
    3. 5. The Project Manager
      1. How Do I Roll Out a Data Warehouse Initiative?
        1. Start with a Data Warehouse Planning Activity
        2. Implement a Proof-of-Concept Pilot
        3. Extend Functionality Iteratively
      2. How Important Is the Hardware Platform?
      3. What Technologies Are Involved?
      4. Do I Still Use Relational Databases for Data Warehousing?
        1. Two Rival Approaches
        2. Warehousing Architectures
        3. Trade-Offs: MDDB vs. RDBMS
      5. How Long Does a Data Warehousing Project Last?
      6. How Is a Data Warehouse Different from Other IT Projects?
        1. A Data Warehouse Project is Not a Package Implementation Project
        2. A Data Warehouse Never Stops Evolving; It Changes with the Business
        3. Data Warehouses Are Huge
        4. Project Progress and Effort Are Highly Dependent on Accessibility and Quality of Source System Data
      7. What Are the Critical Success Factors of a Data Warehousing Project?
      8. In Summary
  5. 3. Process
    1. 6. Warehousing Strategy
      1. Strategy Components
      2. Determine Organizational Context
      3. Conduct Preliminary Survey of Requirements
        1. Interview Categories and Sample Questions
        2. Interviewing Tips
      4. Conduct Preliminary Source System Audit
      5. Identify External Data Sources (If Applicable)
      6. Define Warehouse Roolouts (Phased Implementation)
      7. Define Preliminary Data Warehouse Architecture
      8. Evaluate Development and Production Environment and Tools
      9. In Summary
    2. 7. Warehouse Management and Support Processes
      1. Define Issue Tracking and Resolution Process
      2. Perform Capacity Planning
      3. Define Warehouse Purging Rules
      4. Define Security Measures
      5. Define Backup and Recovery Strategy
      6. Set Up Collection of Warehouse Usage Statistics
      7. In Summary
    3. 8. Data Warehouse Planning
      1. Assemble and Orient Team
      2. Conduct Decisional Requirements Analysis
        1. Decisional Requirements Analysis Is Working Top-Down
        2. Conducting Warehouse Planning without a Warehouse Strategy
      3. Conduct Decisional Source System Audit
        1. Data Sources Can Be Internal or External
        2. DBAs and IT Support Staff Are the Best Resource Persons
      4. Design Logical and Physical Warehouse Schema
      5. Produce Source-to-Target Field Mapping
        1. Many-to-Many Mappings
        2. Historical Data and Evolving Data Structures
      6. Select Development and Production Environment and Tools
      7. Create Prototype for This Rollout
      8. Create Implementation Plan of This Rollout
      9. Warehouse Planning Tips and Caveats
        1. Follow the Data Trail
        2. Limitations Imposed by Currently Available Data
        3. Improvements to Source Systems
      10. In Summary
    4. 9. Data Warehouse Implementation
      1. Acquire and Set Up Development Environment
      2. Obtain Copies of Operational Tables
      3. Finalize Physical Warehouse Schema Design
      4. Build or Configure Extraction and Transformation Subsystems
        1. Extraction Subsystem
        2. Transformation Subsystem
      5. Build or Configure Data Quality Subsystem
        1. Causes of Data Errors
        2. Data Quality Improvement Approach
        3. Data Quality Assessment and Improvements
        4. Correcting Data Errors at the Source
        5. Other Considerations
      6. Build Warehouse Load Subsystem
        1. Basic Features of a Load Subsystem
        2. Loading Dirty Data
        3. The Need for Load Optimization
        4. Test Loads
        5. Set Up Data Warehouse Schema
      7. Set Up Warehouse Metadata
      8. Set Up Data Access and Retrieval Tools
        1. Acquire and Install Data Access and Retrieval Tools
        2. Build Predefined Reports and Queries
        3. Set Up Role or Group Profiles
        4. Set Up User Profiles and Map These to Role Profiles
      9. Perform the Production Warehouse Load
      10. Conduct User Training
        1. Scope of User Training
        2. Who Should Attend the Training?
        3. Different Users Have Different Training Needs
        4. Training as a Prerequisite to Testing
      11. Conduct User Testing and Acceptance
        1. Conduct Warehouse Trials
        2. Acceptance
      12. In Summary
  6. 4. Technology
    1. 10. Hardware and Operating Systems
      1. Parallel Hardware Technology
      2. Hardware Selection Criteria
      3. In summary
    2. 11. Warehousing Software
      1. Middleware and Connectivity Tools
      2. Extraction Tools
        1. Tool Selection
        2. Extraction Methods
      3. Transformation Tools
      4. Data Quality Tools
      5. Data Loaders
      6. Database Management Systems
        1. Relational Database Management Systems (RDBMS)
        2. Multidimensional Databases (MDDBs)
        3. Convergence of RDBMSes and MDDBs
      7. Metadata Repository
      8. Data Access and Retrieval Tools
        1. Online Analytical Processing (OLAP) Tools
        2. Reporting Tools
        3. Executive Information Systems (EIS)
        4. Data Mining
        5. Exception Reporting and Alert Systems
        6. Web-Enabled Products
      9. Data Modeling Tools
      10. Warehouse Management Tools
      11. Source Systems
      12. In Summary
    3. 12. Warehouse Schema Design
      1. OLTP Systems Use Normalized Data Structures
        1. Users Find Normalized Data Structures Difficult to Understand
        2. Normalized Data Structures Require Knowledge of SQL
        3. Normalized Data Structures Are Not Optimized to Support Decisional Queries
      2. Dimensional Modeling for Decisional Systems
      3. Two Types of Tables: Facts and Dimensions
        1. Fact Tables
        2. Dimension Tables
        3. Facts and Dimensions in Reports
      4. A Schema Is a Fact Table Plus Its Related Dimension Tables
      5. Facts Are Fully Normalized, Dimensions Are Denormalized
      6. Dimensional Hierarchies and Hierarchical Drilling
      7. The Time Dimension
      8. The Granularity of the Fact Table
      9. The Fact Table Key Concatenates Dimension Keys
      10. Aggregates or Summaries
        1. Computation of Aggregates is Based on Base-Level Schemas
        2. Aggregates Have Fewer Records than Do Base-Level Schemas
      11. Dimensional Attributes
      12. Multiple Star Schemas
      13. Core and Custom Tables
      14. In Summary
    4. 13. Warehouse Metadata
      1. Metadata Are a Form of Abstration
      2. Why Are Metadata Important?
        1. Metadata Establish the Context of the Warehouse Data
        2. Metadata Facilitate the Analysis Process
        3. Metadata Are a Form of Audit Trail for Data Transformation
        4. Metadata Improve or Maintain Data Quality
      3. Metadata Types
        1. Administrative Metadata
        2. End-User Metadata
        3. Optimization Metadata
      4. Versioning
      5. Metadata as the Basis for Automating Warehousing Tasks
      6. In Summary
    5. 14. Warehousing Applications
      1. The Early Adopters
      2. Types of Warehousing Applications
        1. Sales and Marketing
      3. Financial Analysis and Management
        1. General Reporting
        2. Customer Care and Service
      4. Specialized Applications of Warehousing Technology
        1. Call Center Integration
        2. Credit Bureau Systems
      5. In Summary
  7. 5. Where to Now?
    1. 15. Warehouse Maintenance and Evolution
      1. Regular Warehous Loads
      2. Warehouse Statistics Collection
      3. Warehouse User Profiles
      4. Security and Access Profiles
      5. Data Quality
      6. Data Growth
      7. Updates to Warehouse Subsystems
        1. Source System Evolution
        2. Use of New or Additional External Data
      8. Database Optimization and Tuning
      9. Data Warehouse Staffing
      10. Warehouse Staff and User Training
        1. User Training
        2. Warehouse Staff Training
      11. Subsequent Warehouse Rollouts
      12. Chargeback Schemes
      13. Disaster Recovery
      14. In Summary
    2. 16. Warehousing Trends
      1. Continued Growth of the Data Warehouse Industry
      2. Increased Adoption of Warehousing Technology by More Industries
      3. Increased Maturity of Data Mining Technologies
      4. Emergence and Use of Metadata Interchange Standards
      5. Increased Availability of Web-Enabled Solutions
      6. Popularity of Windows NT for Data Mart Projects
      7. Availability of Warehousing Modules for Application Packages
      8. More Mergers and Acquisitions Among Warehouse Players
      9. In Summary
  8. 6. Appendices
    1. A. R/OLAPXL® User's Manual
      1. Welcome to R/OLAPXL!
        1. Sample Database and Reports
        2. ODBC Drivers Required
        3. The R/OLAPXL Server
        4. Manual Contents
        5. Conventions Used in This Manual
        6. Other Documentation
        7. System Requirements
      2. Installation
        1. Installation Procedure
        2. Successful Installation
        3. Uninstalling R/OLAPXL
        4. To uninstall the R/OLAPXL Client Software:
      3. Tutorial
        1. Starting Up R/OLAPXL
        2. Selecting a Data Source
        3. Opening a R/OLAPXL Report
        4. Running the R/OLAPXL Report
        5. Printing a R/OLAPXL Report
        6. Modifying a R/OLAPXL Report
      4. User's Guide
        1. Basic Concepts
        2. Creating a R/OLAPXL Report
        3. Saving R/OLAPXL Reports
        4. Closing a R/OLAPXL Report
      5. Working with R/OLAPXL Columns
        1. Adding a R/OLAPXL Column
        2. Moving a R/OLAPXL Column
        3. Deleting R/OLAPXL Columns
        4. Using Derived Columns
        5. Using Functions
        6. Drilling Up and Down Using R/OLAPXL Columns
        7. Ad Hoc Drills
        8. Hierarchical Drills
        9. Drilling Up Hierarchically
        10. Drilling Down Hierarchically
        11. Drilling To Hierarchically
        12. Constraining Your R/OLAPXL Worksheet
        13. Constraint Sets in R/OLAPXL
      6. Setting R/OLAPXL Options
        1. Rerun Report After Drilling
        2. Start Cell
        3. Suppress Repeating Values
        4. Cascading User Formulas in the Report Body
        5. Running a R/OLAPXL Report
        6. Using Subtotals
        7. Subtotals and Column Suppressing
        8. Subtotals and Sorting
        9. Sorting R/OLAPXL Reports
        10. Sorting and Column Suppressing
        11. Sorting and Subtotals
        12. Using Pivot Tables with your R/OLAPXL Reports
        13. Pivot Table Data Sources
        14. Commands for R/OLAPXL Pivot Table Reports
        15. New or View R/OLAPXL Pivot Report
        16. View R/OLAPXL Pivot Report Source
        17. Refresh R/OLAPXL Pivot Report
        18. Other Pivot Table Commands
        19. Viewing R/OLAPXL SQL Statements
      7. The R/OLAPXL Toolbars
        1. The Standard Toolbar
        2. The Pivot Toolbar
      8. Macro Programming
        1. Simple Parameterized Reports
        2. Warehouse Schema Knowledge Is Required
        3. Creating Parameterized Reports
        4. API_SetConstraint Function
        5. API_SetConstraintDirect Function
        6. API_Value Function
        7. R/OLAPXL Sample Retail Database
        8. Sales Fact
        9. Product Dimension
        10. Time Dimension
        11. Promotion Dimension
        12. Store Dimension
      9. R/OLAPXL Messages
        1. Error Messages
        2. Informational Messages
        3. Warning Messages
        4. Prompts
    2. B. Warehouse Designer® User's Manual
      1. Welcome to Warehouse Designer!
        1. Manual Contents
        2. Conventions Used in this Manual
        3. System Requirements
        4. Installation Procedure
        5. Successful Installation
        6. Uninstalling Warehouse Designer
      2. Basic Consepts
        1. Dimensional Modeling Concepts
        2. Business Concepts
        3. The Warehouse Explorer
      3. The Warehouse Designer Toolbars
        1. The [Warehouse Explorer] Toolbar
        2. The Data Forms Toolbar
      4. Applications
      5. Dimensions
      6. Schemas
      7. Custom Dimensions
      8. Custom Schemas
      9. Aggregate Dimensions
      10. Aggregate Schemas
    3. C. Online Data Warehousing Resources
      1. Date Warehousing Associations
      2. Online Resources
      3. Online Magazines and Periodicals
      4. Data Warehousing List Server
      5. Newsgroups
    4. D. Tool and Vendor Inventory
      1. Extraction, Transportation and Transformation Technologies
      2. Data Quality Tools
      3. Data Access and Retrieval Tools
      4. Data Modeling Tools
      5. Database Management Tools
      6. Metadata Management Tools
      7. External Data Sources
    5. E. Software License Agreement