O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Measuring Data Quality for Ongoing Improvement

Book Description

The Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time. You’ll start with general concepts of measurement and work your way through a detailed framework of more than three dozen measurement types related to five objective dimensions of quality: completeness, timeliness, consistency, validity, and integrity. Ongoing measurement, rather than one time activities will help your organization reach a new level of data quality. This plain-language approach to measuring data can be understood by both business and IT and provides practical guidance on how to apply the DQAF within any organization enabling you to prioritize measurements and effectively report on results. Strategies for using data measurement to govern and improve the quality of data and guidelines for applying the framework within a data asset are included. You’ll come away able to prioritize which measurement types to implement, knowing where to place them in a data flow and how frequently to measure. Common conceptual models for defining and storing of data quality results for purposes of trend analysis are also included as well as generic business requirements for ongoing measuring and monitoring including calculations and comparisons that make the measurements meaningful and help understand trends and detect anomalies.

      • Demonstrates how to leverage a technology independent data quality measurement framework for your specific business priorities and data quality challenges

          • Enables discussions between business and IT with a non-technical vocabulary for data quality measurement

              • Describes how to measure data quality on an ongoing basis with generic measurement types that can be applied to any situation

        Table of Contents

        1. Cover image
        2. Title page
        3. Table of Contents
        4. Copyright
        5. Dedication
        6. Acknowledgments
        7. Foreword
        8. Author Biography
        9. Introduction: Measuring Data Quality for Ongoing Improvement
          1. Data Quality Measurement: the Problem we are Trying to Solve
          2. Recurring Challenges in the Context of Data Quality
          3. DQAF: the Data Quality Assessment Framework
          4. Overview of Measuring Data Quality for Ongoing Improvement
          5. Intended Audience
          6. What Measuring Data Quality for Ongoing Improvement Does Not Do
          7. Why I Wrote Measuring Data Quality for Ongoing Improvement
        10. Section 1. Concepts and Definitions
          1. Chapter 1. Data
            1. Purpose
            2. Data
            3. Data as Representation
            4. Data as Facts
            5. Data as a Product
            6. Data as Input to Analyses
            7. Data and Expectations
            8. Information
            9. Concluding Thoughts
          2. Chapter 2. Data, People, and Systems
            1. Purpose
            2. Enterprise or Organization
            3. IT and the Business
            4. Data Producers
            5. Data Consumers
            6. Data Brokers
            7. Data Stewards and Data Stewardship
            8. Data Owners
            9. Data Ownership and Data Governance
            10. IT, the Business, and Data Owners, Redux
            11. Data Quality Program Team
            12. Stakeholder
            13. Systems and System Design
            14. Concluding Thoughts
          3. Chapter 3. Data Management, Models, and Metadata
            1. Purpose
            2. Data Management
            3. Database, Data Warehouse, Data Asset, Dataset
            4. Source System, Target System, System of Record
            5. Data Models
            6. Types of Data Models
            7. Physical Characteristics of Data
            8. Metadata
            9. Metadata as Explicit Knowledge
            10. Data Chain and Information Life Cycle
            11. Data Lineage and Data Provenance
            12. Concluding Thoughts
          4. Chapter 4. Data Quality and Measurement
            1. Purpose
            2. Data Quality
            3. Data Quality Dimensions
            4. Measurement
            5. Measurement as Data
            6. Data Quality Measurement and the Business/IT Divide
            7. Characteristics of Effective Measurements
            8. Data Quality Assessment
            9. Data Quality Dimensions, DQAF Measurement Types, Specific Data Quality Metrics
            10. Data Profiling
            11. Data Quality Issues and Data Issue Management
            12. Reasonability Checks
            13. Data Quality Thresholds
            14. Process Controls
            15. In-line Data Quality Measurement and Monitoring
            16. Concluding Thoughts
        11. Section 2. DQAF Concepts and Measurement Types
          1. Chapter 5. DQAF Concepts
            1. Purpose
            2. The Problem the DQAF Addresses
            3. Data Quality Expectations and Data Management
            4. The Scope of the DQAF
            5. DQAF Quality Dimensions
            6. Defining DQAF Measurement Types
            7. Metadata Requirements
            8. Objects of Measurement and Assessment Categories
            9. Functions in Measurement: Collect, Calculate, Compare
            10. Concluding Thoughts
          2. Chapter 6. DQAF Measurement Types
            1. Purpose
            2. Consistency of the Data Model
            3. Ensuring the Correct Receipt of Data for Processing
            4. Inspecting the Condition of Data upon Receipt
            5. Assessing the Results of Data Processing
            6. Assessing the Validity of Data Content
            7. Assessing the Consistency of Data Content
            8. Comments on the Placement of In-line Measurements
            9. Periodic Measurement of Cross-table Content Integrity
            10. Assessing Overall Database Content
            11. Assessing Controls and Measurements
            12. The Measurement Types: Consolidated Listing
            13. Concluding Thoughts
        12. Section 3. Data Assessment Scenarios
          1. Purpose
          2. Assessment Scenarios
          3. Metadata: Knowledge before Assessment
          4. Chapter 7. Initial Data Assessment
            1. Purpose
            2. Initial Assessment
            3. Input to Initial Assessments
            4. Data Expectations
            5. Data Profiling
            6. Column Property Profiling
            7. Structure Profiling
            8. Profiling an Existing Data Asset
            9. From Profiling to Assessment
            10. Deliverables from Initial Assessment
            11. Concluding Thoughts
          5. Chapter 8. Assessment in Data Quality Improvement Projects
            1. Purpose
            2. Data Quality Improvement Efforts
            3. Measurement in Improvement Projects
          6. Chapter 9. Ongoing Measurement
            1. Purpose
            2. The Case for Ongoing Measurement
            3. Example: Health Care Data
            4. Inputs for Ongoing Measurement
            5. Criticality and Risk
            6. Automation
            7. Controls
            8. Periodic Measurement
            9. Deliverables from Ongoing Measurement
            10. In-Line versus Periodic Measurement
            11. Concluding Thoughts
        13. Section 4. Applying the DQAF to Data Requirements
          1. Context
          2. Chapter 10. Requirements, Risk, Criticality
            1. Purpose
            2. Business Requirements
            3. Data Quality Requirements and Expected Data Characteristics
            4. Data Quality Requirements and Risks to Data
            5. Factors Influencing Data Criticality
            6. Specifying Data Quality Metrics
            7. Concluding Thoughts
          3. Chapter 11. Asking Questions
            1. Purpose
            2. Asking Questions
            3. Understanding the Project
            4. Learning about Source Systems
            5. Your Data Consumers’ Requirements
            6. The Condition of the Data
            7. The Data Model, Transformation Rules, and System Design
            8. Measurement Specification Process
            9. Concluding Thoughts
        14. Section 5. A Strategic Approach to Data Quality
          1. Chapter 12. Data Quality Strategy
            1. Purpose
            2. The Concept of Strategy
            3. Systems Strategy, Data Strategy, and Data Quality Strategy
            4. Data Quality Strategy and Data Governance
            5. Decision Points in the Information Life Cycle
            6. General Considerations for Data Quality Strategy
            7. Concluding Thoughts
          2. Chapter 13. Directives for Data Quality Strategy
            1. Purpose
            2. Directive 1: Obtain Management Commitment to Data Quality
            3. Directive 2: Treat Data as an Asset
            4. Directive 3: Apply Resources to Focus on Quality
            5. Directive 4: Build Explicit Knowledge of Data
            6. Directive 5: Treat Data as a Product of Processes that can be Measured and Improved
            7. Directive 6: Recognize Quality is Defined by Data Consumers
            8. Directive 7: Address the Root Causes of Data Problems
            9. Directive 8: Measure Data Quality, Monitor Critical Data
            10. Directive 9: Hold Data Producers Accountable for the Quality of their Data (and Knowledge about that Data)
            11. Directive 10: Provide Data Consumers with the Knowledge they Require for Data Use
            12. Directive 11: Data Needs and Uses will Evolve—Plan for Evolution
            13. Directive 12: Data Quality Goes beyond the Data—Build a Culture Focused on Quality
            14. Concluding Thoughts: Using the Current State Assessment
        15. Section 6. The DQAF in Depth
          1. Functions for Measurement: Collect, Calculate, Compare
          2. Features of the DQAF Measurement Logical Data Model
          3. Facets of the DQAF Measurement Types
          4. Chapter 14. Functions of Measurement: Collection, Calculation, Comparison
            1. Purpose
            2. Functions in Measurement: Collect, Calculate, Compare
            3. Collecting Raw Measurement Data
            4. Calculating Measurement Data
            5. Comparing Measurements to Past History
            6. Statistics
            7. The Control Chart: A Primary Tool for Statistical Process Control
            8. The DQAF and Statistical Process Control
            9. Concluding Thoughts
          5. Chapter 15. Features of the DQAF Measurement Logical Model
            1. Purpose
            2. Metric Definition and Measurement Result Tables
            3. Optional Fields
            4. Denominator Fields
            5. Automated Thresholds
            6. Manual Thresholds
            7. Emergency Thresholds
            8. Manual or Emergency Thresholds and Results Tables
            9. Additional System Requirements
            10. Support Requirements
            11. Concluding Thoughts
          6. Chapter 16. Facets of the DQAF Measurement Types
            1. Purpose
            2. Facets of the DQAF
            3. Organization of the Chapter
            4. Measurement Type #1: Dataset Completeness—Sufficiency of Metadata and Reference Data
            5. Measurement Type #2: Consistent Formatting in One Field
            6. Measurement Type #3: Consistent Formatting, Cross-table
            7. Measurement Type #4: Consistent Use of Default Value in One Field
            8. Measurement Type #5: Consistent Use of Default Values, Cross-table
            9. Measurement Type #6: Timely Delivery of Data for Processing
            10. Measurement Type #7: Dataset Completeness—Availability for Processing
            11. Measurement Type #8: Dataset Completeness—Record Counts to Control Records
            12. Measurement Type #9: Dataset Completeness—Summarized Amount Field Data
            13. Measurement Type #10: Dataset Completeness—Size Compared to Past Sizes
            14. Measurement Type #11: Record Completeness—Length
            15. Measurement Type #12: Field Completeness—Non-Nullable Fields
            16. Measurement Type #13: Dataset Integrity—De-Duplication
            17. Measurement Type #14: Dataset Integrity—Duplicate Record Reasonability Check
            18. Measurement Type #15: Field Content Completeness—Defaults from Source
            19. Measurement Type #16: Dataset Completeness Based on Date Criteria
            20. Measurement Type #17: Dataset Reasonability Based on Date Criteria
            21. Measurement Type #18: Field Content Completeness—Received Data is Missing Fields Critical to Processing
            22. Measurement Type #19: Dataset Completeness—Balance Record Counts Through a Process
            23. Measurement Type #20: Dataset Completeness—Reasons for Rejecting Records
            24. Measurement Type #21: Dataset Completeness Through a Process—Ratio of Input to Output
            25. Measurement Type #22: Dataset Completeness Through a Process—Balance Amount Fields
            26. Measurement Type #23: Field Content Completeness—Ratio of Summed Amount Fields
            27. Measurement Type #24: Field Content Completeness—Defaults from Derivation
            28. Measurement Type #25: Data Processing Duration
            29. Measurement Type #26: Timely Availability of Data for Access
            30. Measurement Type #27: Validity Check, Single Field, Detailed Results
            31. Measurement Type #28: Validity Check, Roll-up
            32. Measurement Logical Data Model
            33. Measurement Type #29: Validity Check, Multiple Columns within a Table, Detailed Results
            34. Measurement Type #30: Consistent Column Profile
            35. Measurement Type #31: Consistent Dataset Content, Distinct Count of Represented Entity, with Ratios to Record Counts
            36. Measurement Type #32 Consistent Dataset Content, Ratio of Distinct Counts of Two Represented Entities
            37. Measurement Type #33: Consistent Multicolumn Profile
            38. Measurement Type #34: Chronology Consistent with Business Rules within a Table
            39. Measurement Type #35: Consistent Time Elapsed (hours, days, months, etc.)
            40. Measurement Type #36: Consistent Amount Field Calculations Across Secondary Fields
            41. Measurement Type #37: Consistent Record Counts by Aggregated Date
            42. Measurement Type #38: Consistent Amount Field Data by Aggregated Date
            43. Measurement Type #39: Parent/Child Referential Integrity
            44. Measurement Type #40: Child/Parent Referential Integrity
            45. Measurement Type #41: Validity Check, Cross Table, Detailed Results
            46. Measurement Type #42: Consistent Cross-table Multicolumn Profile
            47. Measurement Type #43: Chronology Consistent with Business Rules Across-tables
            48. Measurement Type #44: Consistent Cross-table Amount Column Calculations
            49. Measurement Type #45: Consistent Cross-Table Amount Columns by Aggregated Dates
            50. Measurement Type #46: Consistency Compared to External Benchmarks
            51. Measurement Type #47: Dataset Completeness—Overall Sufficiency for Defined Purposes
            52. Measurement Type #48: Dataset Completeness—Overall Sufficiency of Measures and Controls
            53. Concluding Thoughts: Know Your Data
        16. Glossary
        17. Bibliography
        18. Index
        19. Online Materials
          1. Appendix A. Measuring the Value of Data
          2. Appendix B. Data Quality Dimensions
            1. Purpose
            2. Richard Wang’s and Diane Strong’s Data Quality Framework, 1996
            3. Thomas Redman’s Dimensions of Data Quality, 1996
            4. Larry English’s Information Quality Characteristics and Measures, 1999
          3. Appendix C. Completeness, Consistency, and Integrity of the Data Model
            1. Purpose
            2. Process Input and Output
            3. High-Level Assessment
            4. Detailed Assessment
            5. Quality of Definitions
            6. Summary
          4. Appendix D. Prediction, Error, and Shewhart’s Lost Disciple, Kristo Ivanov
            1. Purpose
            2. Limitations of the Communications Model of Information Quality
            3. Error, Prediction, and Scientific Measurement
            4. What Do We Learn from Ivanov?
            5. Ivanov’s Concept of the System as Model
          5. Appendix E. Quality Improvement and Data Quality
            1. Purpose
            2. A Brief History of Quality Improvement
            3. Process Improvement Tools
            4. Implications for Data Quality
            5. Limitations of the Data as Product Metaphor
            6. Concluding Thoughts: Building Quality in Means Building Knowledge in