Chapter 10

Data Profiling

We have considered the structure and models for data warehouse data, as well as the metadata that facilitates the use of that data. In fact, most of the book so far has basically centered on planning and infrastructure—what goes into the project before you actually start. At this point, we will finally begin to examine the data sets themselves, using a technique called data profiling, and get an understanding as to their usability. And often, despite what you believe to be represented within those source data sets, you may be surprised to find out that some data sets do not really represent what you think they do.

Data profiling is a process of analyzing raw data for the purpose of characterizing the information embedded ...

Get Business Intelligence, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.