You’ve been creating tables without giving much thought to them. And that’s fine, they work. You can
UPDATE with them. But as you get more data, you start seeing things you wish you’d done to make your
WHERE clauses simpler. What you need is to make your tables more normal.
Jack and Mark both created tables to store information about record-setting fish. Mark’s table has columns for the species and common names of the fish, its weight, and where it was caught. It doesn’t include the names of the people who caught the fish.
Jack’s table has the common name and weight of the fish, but it also contains the first and last names of the people who caught them, and it breaks down the location into a column containing the name of the body of water where the fish was caught, and a separate state column.
SQL is known as a Relational Database Management System, or RDBMS. Don’t bother memorizing it. We only care about the word RELATIONAL. All this means to you is that to design a killer table, you need to consider how the columns relate to each other to describe a thing.
The challenge is to describe the thing using columns in a way that makes getting the information out of it easy. This depends on what you need from the table, but there are some very broad steps you can follow when you’re creating a table.
Pick your thing, the one thing you want your table to describe.
Make a list of the information you need to know about your one thing when you’re using the table.
Using the list, break down the information about your thing into pieces you can use for organizing your table.
At least, not in this case. If Jack had been writing an article about the best places to go on vacation and catch a big fish, then he might have wanted the street number and name so readers could find accommodations nearby.
But Jack only needed location and state, so he only added as many columns as he needed to save space in his database. At that point, he decided his data was broken down enough—it is atomic.
What’s an atom? A little piece of information that can’t or shouldn’t be divided. It’s the same for your data. When it’s ATOMIC, that means that it’s been broken down into the smallest pieces of data that can’t or shouldn’t be divided.
Consider a pizza delivery guy. To get to where he’s going, he just needs a street number and address in a single column. For his purposes, that’s atomic. He never needs to look for a single street number on its own.
In fact, if his data were broken into street number and street name, his queries would have to be longer and more complicated, making it take him longer to get the pizza to your front door.
Well, making your tables NORMAL means they follow some standard rules your new designers will understand. And the good news is, our tables with atomic data are halfway there.
Making your data atomic is the first step in creating a NORMAL table.
With less data to search through, your queries will be faster.
Because, even when your tables are tiny, it adds up. And tables grow. If you begin with a normalized table, you won’t have to go back and change your table when your queries go too slowly.
Remember the clown table? Clown tracking has become a nationwide craze, and our old table isn’t going to cut it because the
activities columns contain so much data. For our purposes, this table is not atomic.
Cherry Hill Senior Center
F, red hair, green dress, huge feet
balloons, little car
Jack Green’s party
M, orange hair, blue suit, huge feet
F, yellow shirt, baggy blue pants
Eric Gray’s Party
M, cigar, black hair, tiny hat
Belmont Senior Center
F, pink hair, huge flower, blue dress
M, blue hair, red suit, huge nose
F, orange suit, baggy pants
F, all pink and sparkly
balancing, little car
M, in drag, polka dotted dress
M, green and purple suit, pointy nose
climbing into tiny car
To be 1NF, a table must follow these two rules:
Each row of data must contain atomic values.
Each row of data must have a unique identifier, known as a Primary Key.
The column in your table that will be your primary key has to be designated as such when you create the table. In a few pages, we’ll create a table and designate a primary key, but before that, let’s take a closer look at what a primary key is.
A primary key is a column in your table that makes each record unique.
The primary key is used to uniquely identify each record
Which means that the data in the primary key column can’t be repeated. Consider a table with the columns shown below. Do you think any of those would make good primary keys?
Take care using SSNs as the Primary Keys for your records.
With identity theft only increasing, people don’t want to give out SSNs—and with good reason. They’re too important to risk. Can you absolutely guarantee that your database is secure? If it’s not, all those SSNs can be stolen, along with your customers’ identities.
If it’s null, it can’t be unique because other records can also be NULL.
The primary key must be given a value when the record is inserted
When you insert a record without a primary key, you run the risk of ending up with a NULL primary key and duplicate rows in your table, which violates First Normal Form.
The primary key must be compact
A primary key should contain only the information it needs to to be unique and nothing extra.
The primary key values can’t be changed
If you could change the value of your key, you’d risk accidentally setting it to a value you already used. Remember, it has to remain unique.
Given all these rules, can you think of a good primary key to use in a table?
Look back through the tables in the book. Do any of them have a column that contains truly unique values?
When it comes to creating primary keys, your best bet may be to create a column that contains a unique number. Think of a table with people’s info, but with an additional column containing a number. In the example below, let’s call it
If it weren’t for the
ID column, the records for John Brown would be identical. But in this case, they’re actually two different people. The
ID column makes these records unique. This table is in first normal form.
There’s a big debate in the SQL world about using synthetic, or made-up, primary keys (like the ID column above) versus using natural keys—data that is already in the table (like a VIN number on a car or SSN number). We won’t take sides, but we will discuss primary keys in more detail in Chapter 7.
It’s time to step back and normalize our tables. We need to make our data atomic and add primary keys. Creating a primary key is normally something we do when we write our
CREATE TABLE code.
Fixing Greg’s table Step 1: SELECT all of your data and save it somehow.
Fixing Greg’s table Step 2: Create a new normal table.
Fixing Greg’s table Step 3: INSERT all that old data into the new table, changing each row to match the new table structure.
So now you can drop your old table.
So, we know that Greg’s table isn’t perfect.
It’s not atomic and it has no primary key. But luckily for Greg, you don’t have to live with the old table, and you don’t have to dump your data.
We can add a primary key to Greg’s table and make the columns more atomic using just one new command. But first, let’s take a little trip to the past...
Greg needs a primary key, and after all the talk about atomic data, he realizes there are a few things he could do to make his columns more atomic. Before we look at how to fix the existing table, let’s look at how we could have created the table in the first place!
Here’s the table we created way back in Chapter 1.
But we really want to look at the
CREATE code here, not the fields in the table, so we can figure out what we should have done at the very beginning without having to write the
CREATE statement over again.
SHOW CREATE_TABLE will return a
CREATE TABLE statement that can exactly recreate our table, minus any data in it. This way, you can always see how the table you are looking at could be created. Try it:
SHOW CREATE TABLE my_contacts;
Take a look at the code we used to create the table in The CREATE TABLE we wrote, and the code below that the
SHOW CREATE TABLE my_contacts gives you. They aren’t identical, but if you paste the code below into a
CREATE TABLE command, the end result will be the same. You don’t need to remove the backticks or data settings, but it’s neater if you do.
Although you could make the code neater (by removing the last line and backticks), you can just copy and paste it to create a table.
Here’s the code our
SHOW CREATE TABLE my_contacts gave us. We removed the backticks and last line. At the top of the column list we added a
contact_id column that we’re setting to
NOT NULL, and at the bottom of the list, we’re add a line
PRIMARY KEY, which we set to use our new
contact_id column as the primary key.
What do you think will happen?
Better yet, try it out for yourself and see what happens.
You won’t have to start over; instead, you can use an ALTER statement.
A table with data in it doesn’t have to be dumped, then dropped, then recreated. We can actually change an existing table. But to do that, we’re going to borrow the
ALTER statement and some of its keywords from Chapter 5.
To see what happened to your table, try a
SELECT * from my_contacts;
Will Greg get his phone number column? Turn to Chapter 5 to find out.
 Some people think that RELATIONAL means multiple tables relating to each other. That’s not correct.