10.3. Scrubbing Constraints

At the point at which you have the raw data scrubbed this far, there is a temptation to simply load it into the “real tables” in the database. Resist the temptation. The syntax of the data might be acceptable, but that does not mean it is right.

We can classify errors as single-column or multicolumn errors. A single-column error might be a gender code of ’B’ when only ’M’ or ’F’ is allowed. A multiple-column error involves individual columns that are valid, but the combination of which is invalid. For example, pregnancy is a valid medical condition; male is a valid gender; but a pregnant male is an invalid combination.

The first test is to see if your key is actually a key by running a test for NULLS and counting ...

Get Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.