Third Normal Form

Second normal form is good, but we can do better. We have seen that if a table scheme is in second normal form, then no strictly informational attribute depends on a proper subset of a key. However, there is another undesirable possibility. Let us illustrate with an example.

Consider the following table scheme and assume, for the purposes of illustration, that no two books with the same title have the same publisher:

{Title,PubID,PageCount,Price}

The only key for this table scheme is {Title,PubID}. Both PageCount and Price are informational attributes only.

Now, let us assume that each publisher decides the price of its books based solely on the page count. First, we observe that this table is in second normal form. To see this, consider the proper subsets of the key. These are:

{Title} and {PubID}

But none of the dependencies:

{Title}  {PageCount}
{Title}  {Price}
{PubID}  {PageCount}
{PubID}  {Price}

hold for this table scheme. After all, knowing the title does not determine the book, since there may be many books of the same title, published by different publishers. Hence, the table is in second normal form.

It is also not correct to say that:

{PageCount}  {Price}

holds, because different publishers may use different price schemes, based on page count. In other words, one publisher may price books over 1000 pages at one price, whereas another may price books over 1000 pages at a different price. However, it is true that:

{PubID,PageCount}  {Price}

holds. In other words, here we have an informational attribute (Price) that depends not on a proper subset of a key, but on a proper subset of a key (PubID) together with another informational attribute (PageCount).

This is bad, since it may produce redundancy. For instance, consider Table 4.3. Note that the price attribute is redundant. After all, we could fill in the Price value for the third row if it were blank, because we know that PubID 2 charges $34.95 for 500-page books.

Table 4-3. Redundant Data in a Table

Title

PubID

PageCount

Price

Moby Dick

1

500

29.95

Giant

2

500

34.95

Moby Dick

2

500

34.95

We can summarize the problem with the dependency:

{PubID,PageCount}  {Price}

by saying that the attribute Price depends upon a set of attributes:

{PubID,PageCount}

that is not a key, not a superkey, and not a proper subset of a key. It is a mix containing one attribute from the key {Title,PubID} and one attribute that is not in any key.

With this example in mind, we can now define third normal form. A table scheme is in third normal form, or 3NF, if it is not possible to have a dependency of the form:

{A1,. . .,Ak}  {B}

where B does not belong to any key (is strictly informational) and {A1,...,Ak} is not a superkey. In other words, third normal form does not permit any strictly informational attribute to depend upon anything other than a superkey. Of course, superkeys determine all attributes, including strictly informational attributes, and so all attributes depend on any superkey. The point is that, with third normal form, strictly informational attributes depend only on superkeys.

Get Access Database Design and Programming, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.