The process of separating your data into tables and creating primary keys is called normalization. Its main goal is to make sure each piece of information appears in the database only once. Duplicating data is very inefficient, because it makes databases larger than they need to be and therefore slows down access. More importantly, the presence of duplicates creates a strong risk that you’ll update only one row of the duplicated data, creating inconsistencies and potentially causing serious errors.
If you list the titles of books in the
authors table as well as the
books table, for example, and you have to
correct a typographic error in a title, you’ll have to search through both
tables and make sure you make the same change every place the title is
listed. It’s better to keep the title in one place and use the ISBN in
In the process of splitting a database into multiple tables, it is also important not to go too far and create more tables than is necessary, which can also lead to inefficient design and slower access.
Luckily, E.F. Codd, the inventor of the relational model, analyzed the concept of normalization and split it into three separate schemas called First, Second, and Third Normal Form. If you modify a database to satisfy each of these forms in order, you will ensure that your database is optimally balanced for fast access, and minimum memory and disk space usage.
To see how the normalization process works, let’s start with the rather monstrous ...