You are previewing Data Warehousing For Dummies®, 2nd Edition.
O'Reilly logo
Data Warehousing For Dummies®, 2nd Edition

Book Description

Data warehousing is one of the hottest business topics, and there’s more to understanding data warehousing technologies than you might think. Find out the basics of data warehousing and how it facilitates data mining and business intelligence with Data Warehousing For Dummies, 2nd Edition.

Data is probably your company’s most important asset, so your data warehouse should serve your needs. The fully updated Second Edition of Data Warehousing For Dummies helps you understand, develop, implement, and use data warehouses, and offers a sneak peek into their future. You’ll learn to:

  • Analyze top-down and bottom-up data warehouse designs

  • Understand the structure and technologies of data warehouses, operational data stores, and data marts

  • Choose your project team and apply best development practices to your data warehousing projects

  • Implement a data warehouse, step by step, and involve end-users in the process

  • Review and upgrade existing data storage to make it serve your needs

  • Comprehend OLAP, column-wise databases, hardware assisted databases, and middleware

  • Use data mining intelligently and find what you need

  • Make informed choices about consultants and data warehousing products

Data Warehousing For Dummies, 2nd Edition also shows you how to involve users in the testing process and gain valuable feedback, what it takes to successfully manage a data warehouse project, and how to tell if your project is on track. You’ll find it’s the most useful source of data on the topic!

Table of Contents

  1. Copyright
  2. About The Author
  3. Author's Acknowledgments
  4. Publisher's Acknowledgments
  5. Introduction
    1. Why I Wrote This Book
    2. How to Use This Book
      1. Part I: The Data Warehouse: Home for Your Data Assets
      2. Part II: Data Warehousing Technology
      3. Part III: Business Intelligence and Data Warehousing
      4. Part IV: Data Warehousing Projects: How to Do Them Right
      5. Part V: Data Warehousing: The Big Picture
      6. Part VI: Data Warehousing in the Not-Too-Distant Future
      7. Part VII: The Part of Tens
    3. Icons Used in This Book
    4. About the Product References in This Book
  6. I. The Data Warehouse: Home for Your Data Assets
    1. 1. What's in a Data Warehouse?
      1. 1.1. The Data Warehouse: A Place for Your Data Assets
        1. 1.1.1. Classifying data: What is a data asset?
        2. 1.1.2. Manufacturing data assets
      2. 1.2. Data Warehousing: A Working Definition
        1. 1.2.1. Today's data warehousing defined
        2. 1.2.2. A broader, forward looking definition
      3. 1.3. A Brief History of Data Warehousing
        1. 1.3.1. Before our time — the foundation
        2. 1.3.2. The 1970s — the preparation
        3. 1.3.3. The 1980s — the birth
        4. 1.3.4. The 1990s — the adolescent
        5. 1.3.5. The 2000s — the adult
      4. 1.4. Is a Bigger Data Warehouse a Better Data Warehouse?
      5. 1.5. Realizing That a Data Warehouse (Usually) Has a Historical Perspective
      6. 1.6. It's Data Warehouse, Not Data Dump
    2. 2. What Should You Expect from Your Data Warehouse?
      1. 2.1. Using the Data Warehouse to Make Better Business Decisions
      2. 2.2. Finding Data at Your Fingertips
      3. 2.3. Facilitating Communications with Data Warehousing
        1. 2.3.1. IT-to-business organization communications
        2. 2.3.2. Communications across business organizations
      4. 2.4. Facilitating Business Change with Data Warehousing
    3. 3. Have It Your Way: The Structure of a Data Warehouse
      1. 3.1. Ensuring That Your Implementations Are Unique
      2. 3.2. Classifying the Data Warehouse
        1. 3.2.1. The data warehouse lite
          1. 3.2.1.1. Subject areas and data content
          2. 3.2.1.2. Data sources
          3. 3.2.1.3. Business intelligence tools
          4. 3.2.1.4. Database
          5. 3.2.1.5. Data extraction, movement, and loading
          6. 3.2.1.6. Architecture
        2. 3.2.2. The data warehouse deluxe
          1. 3.2.2.1. Subject areas and data content
          2. 3.2.2.2. Data sources
          3. 3.2.2.3. Business intelligence tools
          4. 3.2.2.4. Database
          5. 3.2.2.5. Data extraction, movement, and loading
          6. 3.2.2.6. Architecture
        3. 3.2.3. The data warehouse supreme
          1. 3.2.3.1. Subject areas and data content
          2. 3.2.3.2. Data sources
          3. 3.2.3.3. Business intelligence tools
          4. 3.2.3.4. Database
          5. 3.2.3.5. Data extraction, movement, and loading
          6. 3.2.3.6. Architecture
      3. 3.3. To Centralize or Distribute, That Is the Question
    4. 4. Data Marts: Your Retail Data Outlet
      1. 4.1. Architectural Approaches to Data Marts
        1. 4.1.1. Data marts sourced by a data warehouse
        2. 4.1.2. Top-down, quick-strike data marts
        3. 4.1.3. Bottom-up, integration-oriented data marts
      2. 4.2. What to Put in a Data Mart
        1. 4.2.1. Geography-bounded data
        2. 4.2.2. Organization-bounded data
        3. 4.2.3. Function-bounded data
        4. 4.2.4. Market-bounded data
        5. 4.2.5. Answers to specific business questions
        6. 4.2.6. Anything!
      3. 4.3. Data mart or data warehouse?
      4. 4.4. Implementing a Data Mart — Quickly
  7. II. Data Warehousing Technology
    1. 5. Relational Databases and Data Warehousing
      1. 5.1. The Old Way of Thinking
        1. 5.1.1. A technology-based discussion: The roots of relational database technology
        2. 5.1.2. The OLAP-only fallacy
      2. 5.2. The New Way of Thinking
        1. 5.2.1. Fine-tuning databases for data warehousing
        2. 5.2.2. Optimizing data access
        3. 5.2.3. Avoiding scanning unnecessary data
        4. 5.2.4. Handling large data volume
      3. 5.3. Designing Your Relational Database for Data Warehouse Usage
        1. 5.3.1. Looking at why traditional relational design techniques don't work well
          1. 5.3.1.1. Explaining normalization in plain language (or trying to)
          2. 5.3.1.2. The side effect of normalization
        2. 5.3.2. Exploring new ways to design a relational-based data warehouse
      4. 5.4. Relational Products and Data Warehousing
        1. 5.4.1. IBM Data Management family
        2. 5.4.2. Microsoft SQL Server
        3. 5.4.3. Oracle
    2. 6. Specialty Databases and Data Warehousing
      1. 6.1. Multidimensional Databases
        1. 6.1.1. The idea behind multidimensional databases
          1. 6.1.1.1. The basics
          2. 6.1.1.2. Is there a limit to the number of dimensions?
          3. 6.1.1.3. How should I choose the levels in a hierarchy?
          4. 6.1.1.4. Physical database structures in an MDDB
        2. 6.1.2. Are multidimensional databases still worth looking at?
      2. 6.2. Horizontal versus Vertical Data Storage Management
      3. 6.3. Data Warehouse Appliances
      4. 6.4. Data Warehousing Specialty Database Products
        1. 6.4.1. Cognos (An IBM company)
        2. 6.4.2. Microsoft
        3. 6.4.3. Oracle
        4. 6.4.4. Sybase IQ
        5. 6.4.5. Vertica
    3. 7. Stuck in the Middle with You: Data Warehousing Middleware
      1. 7.1. What Is Middleware?
      2. 7.2. Middleware for Data Warehousing
        1. 7.2.1. The services
        2. 7.2.2. Should you use tools or custom code?
      3. 7.3. What Each Middleware Service Does for You
        1. 7.3.1. Data selection and extractions
        2. 7.3.2. Data quality assurance, part I
        3. 7.3.3. Data movement, part I
        4. 7.3.4. Data mapping and transformation
        5. 7.3.5. Data quality assurance, part II
        6. 7.3.6. Data movement, part II
        7. 7.3.7. Data loading
      4. 7.4. Specialty Middleware Services
        1. 7.4.1. Replication services for data warehousing
        2. 7.4.2. Enterprise Information Integration services
          1. 7.4.2.1. EII architecture
          2. 7.4.2.2. Additional EII services
          3. 7.4.2.3. Facing the EII infrastructure challenge
      5. 7.5. Vendors with Middleware Products for Data Warehousing
        1. 7.5.1. Composite Software
        2. 7.5.2. IBM
        3. 7.5.3. Informatica
        4. 7.5.4. Ipedo
        5. 7.5.5. Microsoft
        6. 7.5.6. Oracle
        7. 7.5.7. Sybase (Avaki)
  8. III. Business Intelligence and Data Warehousing
    1. 8. An Intelligent Look at Business Intelligence
      1. 8.1. The Main Categories of Business Intelligence
        1. 8.1.1. Querying and reporting
        2. 8.1.2. Business analysis (OLAP)
        3. 8.1.3. Data mining
        4. 8.1.4. Dashboards and scorecards
      2. 8.2. Other Types of Business Intelligence
        1. 8.2.1. Statistical processing
        2. 8.2.2. Geographical information systems
        3. 8.2.3. Mash-ups
        4. 8.2.4. Business intelligence applications
      3. 8.3. Business Intelligence Architecture and Data Warehousing
    2. 9. Simple Database Querying and Reporting
      1. 9.1. What Functionality Does a Querying and Reporting Tool Provide?
        1. 9.1.1. The role of SQL
        2. 9.1.2. Technical query tools
        3. 9.1.3. User query tools
        4. 9.1.4. Reporting tools
        5. 9.1.5. The idea of managed queries and reports
      2. 9.2. Is This All You Need?
      3. 9.3. Designing a Relational Database for Querying and Reporting Support
      4. 9.4. Vendors with Querying and Reporting Products for Data Warehousing
        1. 9.4.1. Business Objects (SAP)
        2. 9.4.2. Cognos (IBM)
        3. 9.4.3. Information Builders
        4. 9.4.4. Microsoft
        5. 9.4.5. Oracle
    3. 10. Business Analysis (OLAP)
      1. 10.1. What Is Business Analysis?
      2. 10.2. The OLAP Acronym Parade
        1. 10.2.1. Business analysis (Visualization)
        2. 10.2.2. OLAP middleware
        3. 10.2.3. OLAP databases
      3. 10.3. First, an Editorial
      4. 10.4. Business Analysis (OLAP) Features: An Overview
        1. 10.4.1. Drill-down
        2. 10.4.2. Drill-up
        3. 10.4.3. Drill-across
        4. 10.4.4. Drill-through
        5. 10.4.5. Pivoting
        6. 10.4.6. Trending
        7. 10.4.7. Nesting
        8. 10.4.8. Visualizing
      5. 10.5. Data Warehousing Business Analysis Vendors
        1. 10.5.1. IBM
        2. 10.5.2. MicroStrategy
        3. 10.5.3. Oracle
        4. 10.5.4. Pentaho
        5. 10.5.5. SAP
        6. 10.5.6. SAS
    4. 11. Data Mining: Hi-Ho, Hi-Ho, It's Off to Mine We Go
      1. 11.1. Data Mining in Specific Business Missions
      2. 11.2. Data Mining and Artificial Intelligence
      3. 11.3. Data Mining and Statistics
      4. 11.4. Some Vendors with Data Mining Products
        1. 11.4.1. Microsoft
        2. 11.4.2. SAS
        3. 11.4.3. SPSS
    5. 12. Dashboards and Scorecards
      1. 12.1. Dashboard and Scorecard Principles
        1. 12.1.1. Dashboards
        2. 12.1.2. Scorecards
      2. 12.2. The Relationship between Dashboards, Scorecards, and the Other Parts of Business Intelligence
      3. 12.3. EIS and Key Indicators
      4. 12.4. The Briefing Book
      5. 12.5. The Portal Command Center
      6. 12.6. Who Produces EIS Products
  9. IV. Data Warehousing Projects: How to Do Them Right
    1. 13. Data Warehousing and Other IT Projects: The Same but Different
      1. 13.1. Why a Data Warehousing Project Is (Almost) Like Any Other Development Project
      2. 13.2. How to Apply Your Company's Best Development Practices to Your Project
      3. 13.3. How to Handle the Uniqueness of Data Warehousing
      4. 13.4. Why Your Data Warehousing Project Must Have Top-Level Buy-In
      5. 13.5. How Do I Conduct a Large, Enterprise-Scale Data Warehousing Initiative?
        1. 13.5.1. Top-down
        2. 13.5.2. Bottom-up
        3. 13.5.3. Mixed-mode
    2. 14. Building a Winning Data Warehousing Project Team
      1. 14.1. Don't Make This Mistake!
      2. 14.2. The Roles You Have to Fill on Your Project
        1. 14.2.1. Project manager
        2. 14.2.2. Technical leader
        3. 14.2.3. Chief architect
        4. 14.2.4. Business requirements analyst
        5. 14.2.5. Data modeler and conceptual/logical database designer
        6. 14.2.6. Database administrator and physical database designer
        7. 14.2.7. Front-end tools specialist and developer
        8. 14.2.8. Middleware specialist
        9. 14.2.9. Quality assurance (QA) specialist
        10. 14.2.10. Source data analyst
        11. 14.2.11. User community interaction manager
        12. 14.2.12. Technical executive sponsor
        13. 14.2.13. User community executive sponsor
      3. 14.3. And Now, the People
      4. 14.4. Organizational Operating Model
    3. 15. You Need What? When? — Capturing Requirements
      1. 15.1. Choosing between Being Business or Technically Driven
      2. 15.2. Technically-Driven Data Warehousing
        1. 15.2.1. Subject area
        2. 15.2.2. Enterprise data modeling
      3. 15.3. Business-Driven Business Intelligence
        1. 15.3.1. Starting with business questions
        2. 15.3.2. Accessing the value of the information
        3. 15.3.3. Defining key business objects
        4. 15.3.4. Building a business model
        5. 15.3.5. Prototyping and iterating with the users
        6. 15.3.6. Signing off on scope
    4. 16. Analyzing Data Sources
      1. 16.1. Begin with Source Data Structures, but Don't Stop There
      2. 16.2. Identify What Data You Need to Analyze
      3. 16.3. Line Up the Help You'll Need
      4. 16.4. Techniques for Analyzing Data Sources and Their Content
      5. 16.5. Analyze What's Not There: Data Gap Analysis
      6. 16.6. Determine Mapping and Transformation Logic
    5. 17. Delivering the Goods
      1. 17.1. Exploring Architecture Principles
        1. 17.1.1. What's an architecture?
        2. 17.1.2. What's an adaptable architecture?
      2. 17.2. Understanding Data Warehousing Architectural Keys
        1. 17.2.1. People and their roles
        2. 17.2.2. Consistent delivery process
        3. 17.2.3. Standard delivery platform
      3. 17.3. Assessing Your Data Warehouse Architecture
        1. 17.3.1. What are you building?
        2. 17.3.2. How are you building it?
        3. 17.3.3. Is the delivery automated?
      4. 17.4. Architecting through Abstraction
    6. 18. User Testing, Feedback, and Acceptance
      1. 18.1. Getting Users Involved Early in Data Warehousing
      2. 18.2. Using Real Business Situations
      3. 18.3. Ensuring That Users Provide Necessary Feedback
      4. 18.4. After the Scope: Involving Users during Design and Development
      5. 18.5. Understanding What Determines User Acceptance
  10. V. Data Warehousing: The Big Picture
    1. 19. The Information Value Chain: Connecting Internal and External Data
      1. 19.1. Identifying Data You Need from Other People
      2. 19.2. Recognizing Why External Data Is Important
      3. 19.3. Viewing External Data from a User's Perspective
      4. 19.4. Determining What External Data You Really Need
      5. 19.5. Ensuring the Quality of Incoming External Data
      6. 19.6. Filtering and Reorganizing Data after It Arrives
      7. 19.7. Restocking Your External Data
      8. 19.8. Acquiring External Data
        1. 19.8.1. Finding external information
        2. 19.8.2. Gathering general information
        3. 19.8.3. Cruising the Internet
      9. 19.9. Maintaining Control over External Data
        1. 19.9.1. Staying on top of changes
        2. 19.9.2. Knowing what to do with historical external data
        3. 19.9.3. Determining when new external data sources are available
        4. 19.9.4. Switching from one external data provider to another
    2. 20. Data Warehousing Driving Quality and Integration
      1. 20.1. The Infrastructure Challenge
      2. 20.2. Data Warehouse Data Stores
        1. 20.2.1. Source data feeds
        2. 20.2.2. Operational data store (ODS)
          1. 20.2.2.1. The ODS defined
          2. 20.2.2.2. An ODS example
          3. 20.2.2.3. ODS feedback loops
        3. 20.2.3. Master data management (MDM)
        4. 20.2.4. Service-oriented architecture (SOA)
      3. 20.3. Dealing with Conflict: Special Challenges to Your Data Warehousing Environment
    3. 21. The View from the Executive Boardroom
      1. 21.1. What Does Top Management Need to Know?
        1. 21.1.1. Tell them this
        2. 21.1.2. Keep selling the data warehousing project
      2. 21.2. Data Warehousing and the Business-Trends Bandwagon
      3. 21.3. Data Warehousing in a Cross-Company Setting
      4. 21.4. Connecting the Enterprise
    4. 22. Existing Sort-of Data Warehouses: Upgrade or Replace?
      1. 22.1. The Data Haves and Have-Nots
        1. 22.1.1. The first step: Cataloguing the extract files, who uses them, and why
        2. 22.1.2. And then, the review
      2. 22.2. Decisions, Decisions
        1. 22.2.1. Choice 1: Get rid of it
        2. 22.2.2. Choice 2: Replace it
        3. 22.2.3. Choice 3: Retain it
      3. 22.3. Caution: Migration Isn't Development — It's Much More Difficult
      4. 22.4. Beware: Don't Take Away Valued Functionality
    5. 23. Surviving in the Computer Industry (and Handling Vendors)
      1. 23.1. How to Be a Smart Shopper at Data Warehousing Conferences and Trade Shows
        1. 23.1.1. Do your homework first
        2. 23.1.2. Ask a lot of questions
        3. 23.1.3. Be skeptical
        4. 23.1.4. Don't get rushed into a purchase
      2. 23.2. Dealing with Data Warehousing Product Vendors
        1. 23.2.1. Check out the product and the company before you begin discussions
        2. 23.2.2. Take the lead during the meeting
        3. 23.2.3. Be skeptical — again
        4. 23.2.4. Be a cautious buyer
      3. 23.3. A Look Ahead: Data Warehousing, Mainstream Technologies, and Vendors
    6. 24. Working with Data Warehousing Consultants
      1. 24.1. Do You Really Need Consultants to Help Build a Data Warehouse?
      2. 24.2. Watch Out, Though!
      3. 24.3. A Final Word about Data Warehousing Consultants
  11. VI. Data Warehousing in the Not-too-Distant Future
    1. 25. Expanding Your Data Warehouse with Unstructured Data
      1. 25.1. Traditional Data Warehousing Means Analyzing Traditional Data Types
      2. 25.2. It's a Multimedia World, After All. . . .
      3. 25.3. How Does Business Intelligence Work with Unstructured Data?
      4. 25.4. An Alternative Path: From Unstructured Information to Structured Data
    2. 26. Agreeing to Disagree about Semantics
      1. 26.1. Defining Semantics
      2. 26.2. Emergence of the Semantic Web?
      3. 26.3. Preparing for Semantic Data Warehousing
      4. 26.4. Starting Out on Your Semantic Journey
        1. 26.4.1. Business intelligence semantic layer management
        2. 26.4.2. Business rules management
    3. 27. Collaborative Business Intelligence
      1. 27.1. Future Business Intelligence Support Model
        1. 27.1.1. Knowledge retention
        2. 27.1.2. Knowledge discovery
        3. 27.1.3. Knowledge proliferation
      2. 27.2. Leveraging Examples from Highly Successful Collaboration Solutions
        1. 27.2.1. Rate a report
        2. 27.2.2. Report relationships
        3. 27.2.3. Find a report
        4. 27.2.4. Find the meaning
        5. 27.2.5. Shared interests — shared information
        6. 27.2.6. Visualization
      3. 27.3. The Vision of Collaborative Business Intelligence
  12. VII. The Part of Tens
    1. 28. Ten Questions to Consider When You're Selecting User Tools
      1. 28.1. Do I Want a Smorgasbord or a Sit-Down Restaurant?
      2. 28.2. Can a User Stop a Runaway Query or Report?
      3. 28.3. How Does Performance Differ with Varying Amounts of Data?
      4. 28.4. Can Users Access Different Databases?
      5. 28.5. Can Data Definitions Be Easily Changed?
      6. 28.6. How Does the Tool Deploy?
      7. 28.7. How Does Performance Change if You Have a Large Number of Users?
      8. 28.8. What Online Help and Assistance Is Available, and How Good Is It?
      9. 28.9. Does the Tool Support Interfaces to Other Products?
      10. 28.10. What Happens When You Pull the Plug?
    2. 29. Ten Secrets to Managing Your Project Successfully
      1. 29.1. Tell It Like It Is
      2. 29.2. Put the Right People in the Right Roles
      3. 29.3. Be a Tough but Fair Negotiator
      4. 29.4. Deal Carefully with Product Vendors
      5. 29.5. Watch the Project Plan
      6. 29.6. Don't Micromanage
      7. 29.7. Use a Project Wiki
      8. 29.8. Don't Overlook the Effect of Organizational Culture
      9. 29.9. Don't Forget about Deployment and Operations
      10. 29.10. Take a Breather Occasionally
    3. 30. Ten Sources of Up-to-Date Information about Data Warehousing
      1. 30.1. The Data Warehousing Institute
      2. 30.2. The Data Warehousing Information Center
      3. 30.3. The OLAP Report
      4. 30.4. Intelligent Enterprise
      5. 30.5. b-eye Business Intelligence Network
      6. 30.6. Wikipedia
      7. 30.7. DMReview.com
      8. 30.8. BusinessIntelligence.com
      9. 30.9. Industry Analysts' Web Sites
      10. 30.10. Product Vendors' Web Sites
    4. 31. Ten Mandatory Skills for a Data Warehousing Consultant
      1. 31.1. Broad Vision
      2. 31.2. Deep Technical Expertise in One or Two Areas
      3. 31.3. Communications Skills
      4. 31.4. The Ability to Analyze Data Sources
      5. 31.5. The Ability to Distinguish between Requirements and Wishes
      6. 31.6. Conflict-Resolution Skills
      7. 31.7. An Early-Warning System
      8. 31.8. General Systems and Application Development Knowledge
      9. 31.9. The Know-How to Find Up-to-Date Information
      10. 31.10. A Hype-Free Vocabulary
    5. 32. Ten Signs of a Data Warehousing Project in Trouble
      1. 32.1. The Project's Scope Phase Ends with No General Consensus
      2. 32.2. The Mission Statement Gets Questioned after the Scope Phase Ends
      3. 32.3. Tools Are Selected without Adequate Research
      4. 32.4. People Get Pulled from Your Team for "Just a Few Days"
      5. 32.5. You're Overruled When You Attempt to Handle Scope Creep
      6. 32.6. Your Executive Sponsor Leaves the Company
      7. 32.7. You Overhear, "This Will Never Work, but I'm Not Saying Anything"
      8. 32.8. You Find a Major "Uh-Oh" in One of the Products You're Using
      9. 32.9. The IT Organization Responsible for Supporting the Project Pulls Its Support
      10. 32.10. Resignations Begin
    6. 33. Ten Signs of a Successful Data Warehousing Project
      1. 33.1. The Executive Sponsor Says, "This Thing Works — It Really Works!"
      2. 33.2. You Receive a Flood of Suggested Enhancements and Additional Capabilities
      3. 33.3. User Group Meetings Are Almost Full
      4. 33.4. The User Base Keeps Growing and Growing and Growing
      5. 33.5. The Executive Sponsor Cheerfully Volunteers Your Company as a Reference Site
      6. 33.6. The Company CEO Asks, "How Can I Get One of Those Things?"
      7. 33.7. The Response to Your Next Funding Request Is, "Whatever You Need — It's Yours."
      8. 33.8. You Get Promoted — and So Do Some of Your Team Members
      9. 33.9. You Achieve Celebrity Status in the Company
      10. 33.10. You Get Your Picture on the Cover of the Rolling Stone
    7. 34. Ten Subject Areas to Cover with Product Vendors
      1. 34.1. Product's Chief Architect
      2. 34.2. Development Team
      3. 34.3. Customer Feedback
      4. 34.4. Employee Retention
      5. 34.5. Marketplace
      6. 34.6. Product Uniqueness
      7. 34.7. Clients
      8. 34.8. The Future
      9. 34.9. Internet and Internet Integration Approach
      10. 34.10. Integrity