Part II: Data Quality—Profiling and Improvement
Introduction
This second part of the book shows methods you can use to profile and improve the quality of the data.
Profiling
The methods for profiling focus primarily on advanced features of the data like:
• the structure of missing values in a one-row-per-subject data mart
• the structure of missing values in a time series data mart
• the fact that observations in time series data are missing
• the detection of complex outliers like multivariate outliers or outliers in time series data
• the detection of duplicate records in the data based on matching algorithms
Methods for simple data profiling and validation are only briefly mentioned and the reader is directed to the respective references. ...