Eliminating Duplicates by Adding an Index
Problem
A table has duplicates and you’d like to get rid of them.
Solution
One way to do this is to create a unique index on the column or columns containing duplicates.
Discussion
If MySQL discovers duplicate key values when you try to create a
PRIMARY
KEY
or a
UNIQUE
index, it aborts the
ALTER
TABLE
operation. To
ignore the duplicates and proceed anyway, use
ALTER
IGNORE
TABLE
rather than ALTER
TABLE
. The
IGNORE
keyword tells MySQL to retain the first
row containing a duplicated key value and discard the others. This
is, in fact, a useful way to eliminate duplicates in a column or set
of columns: just create a unique-valued index and let MySQL throw
away the duplicates. (If you need to identify which key values are
duplicated, though, that’s not a suitable technique.
See Recipe 14.4 for information on duplicate
identification.)
To see how IGNORE
works to eliminate duplicates,
use mytbl
, which now has no indexes if
you’ve issued the index-modification statements
shown earlier. First, insert some duplicate values into the table:
mysql>INSERT INTO mytbl (i,c) VALUES(1,'a'),(1,'a'),(1,NULL),(1,NULL),
->(2,'a'),(2,'a'),(2,'b'),(2,'b');
mysql>SELECT * FROM mytbl;
+---+------+ | i | c | +---+------+ | 1 | a | | 1 | a | | 1 | NULL | | 1 | NULL | | 2 | a | | 2 | a | | 2 | b | | 2 | b | +---+------+
Now suppose you want to create a unique-valued index comprising the
i
and c
columns. A
PRIMARY
KEY
cannot be used
here, because c
contains NULL
values. ...
Get MySQL Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.