You are previewing Enterprise Knowledge Management.
O'Reilly logo
Enterprise Knowledge Management

Book Description

Today, companies capture and store tremendous amounts of information about every aspect of their business: their customers, partners, vendors, markets, and more. But with the rise in the quantity of information has come a corresponding decrease in its quality--a problem businesses recognize and are working feverishly to solve.
Enterprise Knowledge Management: The Data Quality Approach presents an easily adaptable methodology for defining, measuring, and improving data quality. Author David Loshin begins by presenting an economic framework for understanding the value of data quality, then proceeds to outline data quality rules and domain-and mapping-based approaches to consolidating enterprise knowledge. Written for both a managerial and a technical audience, this book will be indispensable to the growing number of companies committed to wresting every possible advantage from their vast stores of business information.

Key Features
* Expert advice from a highly successful data quality consultant
* The only book on data quality offering the business acumen to appeal to managers and the technical expertise to appeal to IT professionals
* Details the high costs of bad data and the options available to companies that want to transform mere data into true enterprise knowledge
* Presents conceptual and practical information complementing companies' interest in data warehousing, data mining, and knowledge discovery

Table of Contents

  1. Titlepage
  2. Contents
  3. PREFACE
  4. Chapter 1: INTRODUCTION
    1. 1.1 Data Quality Horror Stories
    2. 1.2 Knowledge Management and Data Quality
    3. 1.3 Reasons for Caring About Data Quality
    4. 1.4 Knowledge Management and Business Rules
    5. 1.5 Structure of This Book
  5. Chapter 2: WHO OWNS INFORMATION?
    1. 2.1 The Information Factory
    2. 2.2 Complicating Notions
    3. 2.3 Responsibilities of Ownership
    4. 2.4 Ownership Paradigms
    5. 2.5 Centralization, Decentralization, and Data Ownership Policies
    6. 2.6 Ownership and Data Quality
    7. 2.7 Summary
  6. Chapter 3: DATA QUALITY IN PRACTICE
    1. 3.1 Data Quality Defined: Fitness for Use
    2. 3.2 The Data Quality Improvement Program
    3. 3.3 Data Quality and Operations
    4. 3.4 Data Quality and Databases
    5. 3.5 Data Quality and the Data Warehouse
    6. 3.6 Data Mining
    7. 3.7 Data Quality and Electronic Data Interchange
    8. 3.8 Data Quality and the World Wide Web
    9. 3.9 Summary
  7. Chapter 4: ECONOMIC FRAMEWORK OF DATA QUALITY AND THE VALUE PROPOSITION
    1. 4.1 Evidence of Economic Impact
    2. 4.2 Data Flows and Information Chains
    3. 4.3 Examples of Information Chains
    4. 4.4 Impacts
    5. 4.5 Economic Measures
    6. 4.6 Impact Domains
    7. 4.7 Operational Impacts
    8. 4.8 Tactical and Strategic Impacts
    9. 4.9 Putting It All Together — The Data Quality Scorecard
    10. 4.10 Adjusting the Model for Solution Costs
    11. 4.11 Example
    12. 4.12 Summary
  8. Chapter 5: DIMENSIONS OF DATA QUALITY
    1. 5.1 Sample Data Application
    2. 5.2 Data Quality of Data Models
    3. 5.3 Data Quality of Data Values
    4. 5.4 Data Quality of Data Domains
    5. 5.5 Data Quality of Data Presentation
    6. 5.6 Data Quality of Information Policy
    7. 5.7 Summary: Importance of the Dimensions of Data Quality
  9. Chapter 6: STATISTICAL PROCESS CONTROL AND THE IMPROVEMENT CYCLE
    1. 6.1 Variation and Control
    2. 6.2 Control Chart
    3. 6.3 The Pareto Principle
    4. 6.4 Building a Control Chart
    5. 6.5 Kinds of Control Charts
    6. 6.6 Example: Invalid Records
    7. 6.7 The Goal of Statistical Process Control
    8. 6.8 Interpreting a Control Chart
    9. 6.9 Finding Special Causes
    10. 6.10 Maintaining Control
    11. 6.11 Summary
  10. Chapter 7: DOMAINS, MAPPINGS, AND ENTERPRISE REFERENCE DATA
    1. 7.1 Data Types
    2. 7.2 Operations
    3. 7.3 Domains
    4. 7.4 Mappings
    5. 7.5 Example: Social Security Numbers
    6. 7.6 Domains, Mappings, and Metadata
    7. 7.7 The Publish/Subscribe Model of Reference Data Provision
    8. 7.8 Summary: Domains, Mappings, and Reference Data
  11. Chapter 8: DATA QUALITY ASSERTIONS AND BUSINESS RULES
    1. 8.1 Data Quality Assertions
    2. 8.2 Data Quality Assertions as Business Rules
    3. 8.3 The Nine Classes of Data Quality Rules
    4. 8.4 Null Value Rules
    5. 8.5 Value Manipulation Operators and Functions
    6. 8.6 Value Rules
    7. 8.7 Domain Membership Rules
    8. 8.8 Domain Mappings and Relations on Finite Defined Domains
    9. 8.9 Relation Rules
    10. 8.10 Table, Cross-Table, and Cross-Message Assertions
    11. 8.11 In-Process Rules
    12. 8.12 Operational Rules
    13. 8.13 Other Rules
    14. 8.14 Rule Management, Compilation, and Validation
    15. 8.15 Rule Ordering
    16. 8.16 Summary
  12. Chapter 9: MEASUREMENT AND CURRENT STATE ASSESSMENT
    1. 9.1 Identify Each Data Customer
    2. 9.2 Mapping the Information Chain
    3. 9.3 Choose Locations in the Information Chain
    4. 9.4 Choose a Subset of the DQ Dimensions
    5. 9.5 Identify Sentinel Rules
    6. 9.6 Measuring Data Quality
    7. 9.7 Measuring Data Quality of Data Models
    8. 9.8 Measuring Data Quality of Data Values
    9. 9.9 Measuring Data Quality of Data Domains
    10. 9.10 Measuring Data Quality of Data Presentation
    11. 9.11 Measuring Data Quality of Information Policy
    12. 9.12 Static vs Dynamic Measurement
    13. 9.13 Compiling Results
    14. 9.14 Summary
  13. Chapter 10: DATA QUALITY REQUIREMENTS
    1. 10.1 The Assessment Process, Reviewed
    2. 10.2 Reviewing the Assessment
    3. 10.3 Determining Expectations
    4. 10.4 Use Case Analysis
    5. 10.5 Assignment of Responsibility
    6. 10.6 Creating Requirements
    7. 10.7 The Data Quality Requirements
    8. 10.8 Summary
  14. Chapter 11: METADATA, GUIDELINES, AND POLICY
    1. 11.1 Generic Elements
    2. 11.2 Data Types and Domains
    3. 11.3 Schema Metadata
    4. 11.4 Use and Summarization
    5. 11.5 Historical
    6. 11.6 Managing Data Domains
    7. 11.7 Managing Domain Mappings
    8. 11.8 Managing Rules
    9. 11.9 Metadata Browsing
    10. 11.10 Metadata as a Driver of Policy
    11. 11.11 Summary
  15. Chapter 12: RULE-BASED DATA QUALITY
    1. 12.1 Rule Basics
    2. 12.2 What Is a Business Rule?
    3. 12.3 Data Quality Rules Are Business Rules (and Vice Versa)
    4. 12.4 What Is a Rule-Based System?
    5. 12.5 Advantages of the Rule-Based Approach
    6. 12.6 Integrating a Rule-Based System
    7. 12.7 Rule Execution
    8. 12.8 Deduction vs Goal-Orientation
    9. 12.9 Evaluation of a Rules System
    10. 12.10 Limitations of the Rule-based Approach
    11. 12.11 Rule-Based Data Quality
    12. 12.12 Summary
  16. Chapter 13: METADATA AND RULE DISCOVERY
    1. 13.1 Domain Discovery
    2. 13.2 Mapping Discovery
    3. 13.3 Clustering for Rule Discovery
    4. 13.4 Key Discovery
    5. 13.5 Decision and Classification Trees
    6. 13.6 Association Rules and Data Quality Rules
    7. 13.7 Summary
  17. Chapter 14: DATA CLEANSING
    1. 14.1 Standardization
    2. 14.2 Common Error Paradigms
    3. 14.3 Record Parsing
    4. 14.4 Metadata Cleansing
    5. 14.5 Data Correction and Enhancement
    6. 14.6 Approximate Matching and Similarity
    7. 14.7 Consolidation
    8. 14.8 Updating Missing Fields
    9. 14.9 Address Standardization
    10. 14.10 Summary
  18. Chapter 15: ROOT CAUSE ANALYSIS AND SUPPLIER MANAGEMENT
    1. 15.1 What Is Root Cause Analysis?
    2. 15.2 Debugging the Process
    3. 15.3 Debugging the Problem
    4. 15.4 Corrective Measures — Resolve or Not?
    5. 15.5 Supplier Management
    6. 15.6 Summary
  19. Chapter 16: DATA ENRICHMENT/ENHANCEMENT
    1. 16.1 What Is Data Enhancement?
    2. 16.2 Examples of Data Enhancement
    3. 16.3 Enhancement Through Standardization
    4. 16.4 Enhancement Through Provenance
    5. 16.5 Enhancement Through Context
    6. 16.6 Enhancement Through Data Merging
    7. 16.7 Data Matching, Merging, and Record Linkage
    8. 16.8 Large-Scale Data Aggregation and Linkage
    9. 16.9 Improving Linkage with Approximate Matching
    10. 16.10 Enhancement Through Inference
    11. 16.11 Data Quality Rules for Enhancement
    12. 16.12 Business Rules for Enhancement
    13. 16.13 Summary
  20. Chapter 17: DATA QUALITY AND BUSINESS RULES IN PRACTICE
    1. 17.1 Turning Rules into Implementation
    2. 17.2 Operational Directives
    3. 17.3 Data Quality and the Transaction Factory
    4. 17.4 Data Quality and the Data Warehouse
    5. 17.5 Rules and EDI
    6. 17.6 Data Quality Rules and Automated Uls
    7. 17.7 Summary
  21. Chapter 18: BUILDING THE DATA QUALITY PRACTICE
    1. 18.1 Step 1: Recognize the Problem
    2. 18.2 Step 2: Management Support and the Data Ownership Policy
    3. 18.3 Step 3: Spread the Word
    4. 18.4 Step 4: Mapping the Information Chain
    5. 18.5 Step 5: Data Quality Scorecard
    6. 18.6 Step 6: Current State Assessment
    7. 18.7 Step 7: Requirements Assessment
    8. 18.8 Step 8: Choose a Project
    9. 18.9 Step 9: Build Your Team
    10. 18.10 Step 10: Build Your Arsenal
    11. 18.11 Step 11: Metadata Model
    12. 18.12 Step 12: Define Data Quality Rules
    13. 18.13 Step 13: Archaeology/Data Mining
    14. 18.14 Step 14: Manage Your Suppliers
    15. 18.15 Step 15: Execute the Improvement
    16. 18.16 Step 16: Measure Improvement
    17. 18.17 Step 17: Build on Each Success
    18. 18.18 Conclusion
  22. INDEX
  23. BIBLIOGRAPHY