This book has as subtitle Normal Forms and All That Jazz. Clearly some explanation is needed! First of all, of course, I’m talking about design theory, and everybody knows normal forms are a major component of that theory; hence the first part of my subtitle. But there’s more to the theory than just normal forms, and that fact accounts for that subtitle’s second part. Third, it’s unfortunately the case that—from the practitioner’s point of view, at any rate—design theory is riddled with terms and concepts that seem to be difficult to understand and don’t seem to have much to do with design as actually done in practice. That’s why I framed the latter part of my subtitle in colloquial (not to say slangy) terms; I wanted to convey the idea, or impression, that although we’d necessarily be dealing with “difficult” material on occasion, the treatment of that material would be as undaunting and unintimidating as I could make it. But whether I’ve succeeded in that aim is for you to judge, of course.
I’d also like to say a little more on the question of whether design theory has anything to do with design as done in practice. Let me be clear: Nobody could, or should, claim that designing databases is easy. But a sound knowledge of theory can only help. In fact, if you want to do design properly—if you want to build databases that are as robust, flexible, and accurate as they’re supposed to be—then you simply have to come to grips with design theory. There’s just no alternative: at least, not if you want to claim to be a professional. Design theory is the scientific foundation for database design, just as the relational model is the scientific foundation for database technology in general. And just as anyone professionally involved in database technology in general needs to be familiar with the relational model, so anyone involved in database design in particular needs to be familiar with design theory. Proper design is so important! After all, the database lies at the heart of so much of what we do in the computing world; so if it’s badly designed, the negative impacts can be extraordinarily widespread.
Since we’re going to be talking quite a lot about normal forms, I thought it might be—well, not enlightening, perhaps, but entertaining (?)—to begin with a few quotes from the literature. The starting point for the whole concept of normal forms is, of course, first normal form (1NF), and so an obvious question is: Do you know what 1NF is? As the following quotes demonstrate (sources omitted to protect the guilty), a lot of people don’t:
To achieve first normal form, each field in a table must convey unique information.
An entity is said to be in the first normal form (1NF) when all attributes are single valued.
A relation is in 1NF if and only if all underlying domains contain atomic values only.
If there are no repeating groups of attributes, then [the table] is in 1NF.
Now, it might be argued that some if not all of these quotes are at least vaguely correct—but they’re all hopelessly sloppy, even when they’re generally on the right lines. (In case you’re wondering, I’ll be giving a precise and accurate definition of 1NF in Chapter 4.)
Let’s take a closer look at what’s going on here. Here again is the first of the foregoing quotes, now given in full:
To achieve first normal form, each field in a table must convey unique information. For example, if you had a Customer table with two columns for the telephone number, your design would violate first normal form. First normal form is fairly easy to achieve, since few folks would see a need for duplicate information in a table.
OK, so apparently we’re talking about a design that looks something like this:
Now, I can’t say whether this is a good design or not, but it certainly doesn’t violate 1NF. (I can’t say whether it’s a good design because I don’t know exactly what “two columns for the telephone number” means. The phrase “duplicate information in a table” suggests we’re recording the same phone number twice, but such an interpretation is absurd on its face. But even if that interpretation is correct, it still wouldn’t constitute a violation of 1NF as such.)
Here’s another one:
First Normal Form ... means the table should have no “repeating groups” of fields ... A repeating group is when you repeat the same basic attribute (field) over and over again. A good example of this is when you wish to store the items you buy at a grocery store ... [and the writer goes on to give an example, presumably meant to illustrate the concept of a repeating group, of a table called Item Table with columns called Customer, Item1, Item2, Item3, and Item4]:
Well, this design is almost certainly bad—what happens if the customer doesn’t purchase exactly four items?—but the reason it’s bad isn’t that it violates 1NF; like the previous example, in fact, it’s a 1NF design. And while it’s true that 1NF does mean, loosely, “no repeating groups,” a repeating group is not “when you repeat the same basic attribute over and over again.” (What it really is I’ll explain in Chapter 4, when I explain what 1NF really is.)
How about this one (a cry for help found on the Internet)? I’m quoting it absolutely verbatim, except that I’ve added some boldface:
I have been trying to find the correct way of normalizing tables in Access. From what I understand, it goes from the 1st normal form to 2nd, then 3rd. Usually, that’s as far as it goes, but sometimes to the 5th and 6th. Then, there’s also the Cobb 3rd. This all makes sense to me. I am supposed to teach a class in this starting next week, and I just got the textbook. It says something entirely different. It says 2nd normal form is only for tables with a multiple-field primary key, 3rd normal form is only for tables with a single-field key. 4th normal form can go from 1st to 4th, where there are no independent one-to-many relationships between primary key and non-key fields. Can someone clear this up for me please?
> It’s not clear to me what “normalized” means. Can you be specific about what normalization rules you are
> referring to? In what way is my schema not normalized?
Normalization: The process of replacing duplicate things with a reference to the original thing.
For example, given “john is-a person” and “john obeys army,” one observes that the “john” in the second sentence is a duplicate of “john” in the first sentence. Using the means provided by your system, the second sentence should be stored as “–>john obeys army.”