Chapter 14. Handling Duplicates

Introduction

Tables or result sets sometimes contain duplicate records. In some cases this is acceptable. For example, if you conduct a web poll that records dates and client IP numbers along with the votes, duplicate records may be allowable, because it’s possible for large numbers of votes to appear to originate from the same IP number for an Internet service that routes traffic from its customers through a single proxy host. In other cases, duplicates will be unacceptable, and you’ll want to take steps to avoid them. Operations related to handling of duplicate records include the following:

  • Counting the number of duplicates to determine whether they occur and to what extent.

  • Identifying duplicated values (or the records containing them) so you can see what they are and where they occur.

  • Eliminating duplicates to ensure that each record is unique. This may involve removing rows from a table to leave only unique records. Or it may involve selecting a result set in such a way that no duplicates appear in the output. (For example, to display a list of the states in which you have customers, you probably wouldn’t want a long list of state names from all customer records. A list showing each state name only once suffices and is easier to understand.)

  • Preventing duplicates from being created within a table in the first place. If each record in a table is intended to represent a single entity (such as a person, an item in a catalog, or a specific observation ...

Get MySQL Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.