O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Measuring Data Quality for Ongoing Improvement

Book Description

The Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time. You’ll start with general concepts of measurement and work your way through a detailed framework of more than three dozen measurement types related to five objective dimensions of quality: completeness, timeliness, consistency, validity, and integrity. Ongoing measurement, rather than one time activities will help your organization reach a new level of data quality. This plain-language approach to measuring data can be understood by both business and IT and provides practical guidance on how to apply the DQAF within any organization enabling you to prioritize measurements and effectively report on results. Strategies for using data measurement to govern and improve the quality of data and guidelines for applying the framework within a data asset are included. You’ll come away able to prioritize which measurement types to implement, knowing where to place them in a data flow and how frequently to measure. Common conceptual models for defining and storing of data quality results for purposes of trend analysis are also included as well as generic business requirements for ongoing measuring and monitoring including calculations and comparisons that make the measurements meaningful and help understand trends and detect anomalies.

  • Demonstrates how to leverage a technology independent data quality measurement framework for your specific business priorities and data quality challenges
  • Enables discussions between business and IT with a non-technical vocabulary for data quality measurement
  • Describes how to measure data quality on an ongoing basis with generic measurement types that can be applied to any situation

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Acknowledgments
  7. Foreword
  8. Author Biography
  9. Introduction: Measuring Data Quality for Ongoing Improvement
    1. Data Quality Measurement: the Problem we are Trying to Solve
    2. Recurring Challenges in the Context of Data Quality
    3. DQAF: the Data Quality Assessment Framework
    4. Overview of Measuring Data Quality for Ongoing Improvement
    5. Intended Audience
    6. What Measuring Data Quality for Ongoing Improvement Does Not Do
    7. Why I Wrote Measuring Data Quality for Ongoing Improvement
  10. Section 1. Concepts and Definitions
    1. Chapter 1. Data
      1. Purpose
      2. Data
      3. Data as Representation
      4. Data as Facts
      5. Data as a Product
      6. Data as Input to Analyses
      7. Data and Expectations
      8. Information
      9. Concluding Thoughts
    2. Chapter 2. Data, People, and Systems
      1. Purpose
      2. Enterprise or Organization
      3. IT and the Business
      4. Data Producers
      5. Data Consumers
      6. Data Brokers
      7. Data Stewards and Data Stewardship
      8. Data Owners
      9. Data Ownership and Data Governance
      10. IT, the Business, and Data Owners, Redux
      11. Data Quality Program Team
      12. Stakeholder
      13. Systems and System Design
      14. Concluding Thoughts
    3. Chapter 3. Data Management, Models, and Metadata
      1. Purpose
      2. Data Management
      3. Database, Data Warehouse, Data Asset, Dataset
      4. Source System, Target System, System of Record
      5. Data Models
      6. Types of Data Models
      7. Physical Characteristics of Data
      8. Metadata
      9. Metadata as Explicit Knowledge
      10. Data Chain and Information Life Cycle
      11. Data Lineage and Data Provenance
      12. Concluding Thoughts
    4. Chapter 4. Data Quality and Measurement
      1. Purpose
      2. Data Quality
      3. Data Quality Dimensions
      4. Measurement
      5. Measurement as Data
      6. Data Quality Measurement and the Business/IT Divide
      7. Characteristics of Effective Measurements
      8. Data Quality Assessment
      9. Data Quality Dimensions, DQAF Measurement Types, Specific Data Quality Metrics
      10. Data Profiling
      11. Data Quality Issues and Data Issue Management
      12. Reasonability Checks
      13. Data Quality Thresholds
      14. Process Controls
      15. In-line Data Quality Measurement and Monitoring
      16. Concluding Thoughts
  11. Section 2. DQAF Concepts and Measurement Types
    1. Chapter 5. DQAF Concepts
      1. Purpose
      2. The Problem the DQAF Addresses
      3. Data Quality Expectations and Data Management
      4. The Scope of the DQAF
      5. DQAF Quality Dimensions
      6. Defining DQAF Measurement Types
      7. Metadata Requirements
      8. Objects of Measurement and Assessment Categories
      9. Functions in Measurement: Collect, Calculate, Compare
      10. Concluding Thoughts
    2. Chapter 6. DQAF Measurement Types
      1. Purpose
      2. Consistency of the Data Model
      3. Ensuring the Correct Receipt of Data for Processing
      4. Inspecting the Condition of Data upon Receipt
      5. Assessing the Results of Data Processing
      6. Assessing the Validity of Data Content
      7. Assessing the Consistency of Data Content
      8. Comments on the Placement of In-line Measurements
      9. Periodic Measurement of Cross-table Content Integrity
      10. Assessing Overall Database Content
      11. Assessing Controls and Measurements
      12. The Measurement Types: Consolidated Listing
      13. Concluding Thoughts
  12. Section 3. Data Assessment Scenarios
    1. Purpose
    2. Assessment Scenarios
    3. Metadata: Knowledge before Assessment
    4. Chapter 7. Initial Data Assessment
      1. Purpose
      2. Initial Assessment
      3. Input to Initial Assessments
      4. Data Expectations
      5. Data Profiling
      6. Column Property Profiling
      7. Structure Profiling
      8. Profiling an Existing Data Asset
      9. From Profiling to Assessment
      10. Deliverables from Initial Assessment
      11. Concluding Thoughts
    5. Chapter 8. Assessment in Data Quality Improvement Projects
      1. Purpose
      2. Data Quality Improvement Efforts
      3. Measurement in Improvement Projects
    6. Chapter 9. Ongoing Measurement
      1. Purpose
      2. The Case for Ongoing Measurement
      3. Example: Health Care Data
      4. Inputs for Ongoing Measurement
      5. Criticality and Risk
      6. Automation
      7. Controls
      8. Periodic Measurement
      9. Deliverables from Ongoing Measurement
      10. In-Line versus Periodic Measurement
      11. Concluding Thoughts
  13. Section 4. Applying the DQAF to Data Requirements
    1. Context
    2. Chapter 10. Requirements, Risk, Criticality
      1. Purpose
      2. Business Requirements
      3. Data Quality Requirements and Expected Data Characteristics
      4. Data Quality Requirements and Risks to Data
      5. Factors Influencing Data Criticality
      6. Specifying Data Quality Metrics
      7. Concluding Thoughts
    3. Chapter 11. Asking Questions
      1. Purpose
      2. Asking Questions
      3. Understanding the Project
      4. Learning about Source Systems
      5. Your Data Consumers’ Requirements
      6. The Condition of the Data
      7. The Data Model, Transformation Rules, and System Design
      8. Measurement Specification Process
      9. Concluding Thoughts
  14. Section 5. A Strategic Approach to Data Quality
    1. Chapter 12. Data Quality Strategy
      1. Purpose
      2. The Concept of Strategy
      3. Systems Strategy, Data Strategy, and Data Quality Strategy
      4. Data Quality Strategy and Data Governance
      5. Decision Points in the Information Life Cycle
      6. General Considerations for Data Quality Strategy
      7. Concluding Thoughts
    2. Chapter 13. Directives for Data Quality Strategy
      1. Purpose
      2. Directive 1: Obtain Management Commitment to Data Quality
      3. Directive 2: Treat Data as an Asset
      4. Directive 3: Apply Resources to Focus on Quality
      5. Directive 4: Build Explicit Knowledge of Data
      6. Directive 5: Treat Data as a Product of Processes that can be Measured and Improved
      7. Directive 6: Recognize Quality is Defined by Data Consumers
      8. Directive 7: Address the Root Causes of Data Problems
      9. Directive 8: Measure Data Quality, Monitor Critical Data
      10. Directive 9: Hold Data Producers Accountable for the Quality of their Data (and Knowledge about that Data)
      11. Directive 10: Provide Data Consumers with the Knowledge they Require for Data Use
      12. Directive 11: Data Needs and Uses will Evolve—Plan for Evolution
      13. Directive 12: Data Quality Goes beyond the Data—Build a Culture Focused on Quality
      14. Concluding Thoughts: Using the Current State Assessment
  15. Section 6. The DQAF in Depth
    1. Functions for Measurement: Collect, Calculate, Compare
    2. Features of the DQAF Measurement Logical Data Model
    3. Facets of the DQAF Measurement Types
    4. Chapter 14. Functions of Measurement: Collection, Calculation, Comparison
      1. Purpose
      2. Functions in Measurement: Collect, Calculate, Compare
      3. Collecting Raw Measurement Data
      4. Calculating Measurement Data
      5. Comparing Measurements to Past History
      6. Statistics
      7. The Control Chart: A Primary Tool for Statistical Process Control
      8. The DQAF and Statistical Process Control
      9. Concluding Thoughts
    5. Chapter 15. Features of the DQAF Measurement Logical Model
      1. Purpose
      2. Metric Definition and Measurement Result Tables
      3. Optional Fields
      4. Denominator Fields
      5. Automated Thresholds
      6. Manual Thresholds
      7. Emergency Thresholds
      8. Manual or Emergency Thresholds and Results Tables
      9. Additional System Requirements
      10. Support Requirements
      11. Concluding Thoughts
    6. Chapter 16. Facets of the DQAF Measurement Types
      1. Purpose
      2. Facets of the DQAF
      3. Organization of the Chapter
      4. Measurement Type #1: Dataset Completeness—Sufficiency of Metadata and Reference Data
      5. Measurement Type #2: Consistent Formatting in One Field
      6. Measurement Type #3: Consistent Formatting, Cross-table
      7. Measurement Type #4: Consistent Use of Default Value in One Field
      8. Measurement Type #5: Consistent Use of Default Values, Cross-table
      9. Measurement Type #6: Timely Delivery of Data for Processing
      10. Measurement Type #7: Dataset Completeness—Availability for Processing
      11. Measurement Type #8: Dataset Completeness—Record Counts to Control Records
      12. Measurement Type #9: Dataset Completeness—Summarized Amount Field Data
      13. Measurement Type #10: Dataset Completeness—Size Compared to Past Sizes
      14. Measurement Type #11: Record Completeness—Length
      15. Measurement Type #12: Field Completeness—Non-Nullable Fields
      16. Measurement Type #13: Dataset Integrity—De-Duplication
      17. Measurement Type #14: Dataset Integrity—Duplicate Record Reasonability Check
      18. Measurement Type #15: Field Content Completeness—Defaults from Source
      19. Measurement Type #16: Dataset Completeness Based on Date Criteria
      20. Measurement Type #17: Dataset Reasonability Based on Date Criteria
      21. Measurement Type #18: Field Content Completeness—Received Data is Missing Fields Critical to Processing
      22. Measurement Type #19: Dataset Completeness—Balance Record Counts Through a Process
      23. Measurement Type #20: Dataset Completeness—Reasons for Rejecting Records
      24. Measurement Type #21: Dataset Completeness Through a Process—Ratio of Input to Output
      25. Measurement Type #22: Dataset Completeness Through a Process—Balance Amount Fields
      26. Measurement Type #23: Field Content Completeness—Ratio of Summed Amount Fields
      27. Measurement Type #24: Field Content Completeness—Defaults from Derivation
      28. Measurement Type #25: Data Processing Duration
      29. Measurement Type #26: Timely Availability of Data for Access
      30. Measurement Type #27: Validity Check, Single Field, Detailed Results
      31. Measurement Type #28: Validity Check, Roll-up
      32. Measurement Logical Data Model
      33. Measurement Type #29: Validity Check, Multiple Columns within a Table, Detailed Results
      34. Measurement Type #30: Consistent Column Profile
      35. Measurement Type #31: Consistent Dataset Content, Distinct Count of Represented Entity, with Ratios to Record Counts
      36. Measurement Type #32 Consistent Dataset Content, Ratio of Distinct Counts of Two Represented Entities
      37. Measurement Type #33: Consistent Multicolumn Profile
      38. Measurement Type #34: Chronology Consistent with Business Rules within a Table
      39. Measurement Type #35: Consistent Time Elapsed (hours, days, months, etc.)
      40. Measurement Type #36: Consistent Amount Field Calculations Across Secondary Fields
      41. Measurement Type #37: Consistent Record Counts by Aggregated Date
      42. Measurement Type #38: Consistent Amount Field Data by Aggregated Date
      43. Measurement Type #39: Parent/Child Referential Integrity
      44. Measurement Type #40: Child/Parent Referential Integrity
      45. Measurement Type #41: Validity Check, Cross Table, Detailed Results
      46. Measurement Type #42: Consistent Cross-table Multicolumn Profile
      47. Measurement Type #43: Chronology Consistent with Business Rules Across-tables
      48. Measurement Type #44: Consistent Cross-table Amount Column Calculations
      49. Measurement Type #45: Consistent Cross-Table Amount Columns by Aggregated Dates
      50. Measurement Type #46: Consistency Compared to External Benchmarks
      51. Measurement Type #47: Dataset Completeness—Overall Sufficiency for Defined Purposes
      52. Measurement Type #48: Dataset Completeness—Overall Sufficiency of Measures and Controls
      53. Concluding Thoughts: Know Your Data
  16. Glossary
  17. Bibliography
  18. Index
  19. Online Materials
    1. Appendix A. Measuring the Value of Data
    2. Appendix B. Data Quality Dimensions
      1. Purpose
      2. Richard Wang’s and Diane Strong’s Data Quality Framework, 1996
      3. Thomas Redman’s Dimensions of Data Quality, 1996
      4. Larry English’s Information Quality Characteristics and Measures, 1999
    3. Appendix C. Completeness, Consistency, and Integrity of the Data Model
      1. Purpose
      2. Process Input and Output
      3. High-Level Assessment
      4. Detailed Assessment
      5. Quality of Definitions
      6. Summary
    4. Appendix D. Prediction, Error, and Shewhart’s Lost Disciple, Kristo Ivanov
      1. Purpose
      2. Limitations of the Communications Model of Information Quality
      3. Error, Prediction, and Scientific Measurement
      4. What Do We Learn from Ivanov?
      5. Ivanov’s Concept of the System as Model
    5. Appendix E. Quality Improvement and Data Quality
      1. Purpose
      2. A Brief History of Quality Improvement
      3. Process Improvement Tools
      4. Implications for Data Quality
      5. Limitations of the Data as Product Metaphor
      6. Concluding Thoughts: Building Quality in Means Building Knowledge in