Eliminating Duplicate Rows from a Dataframe

Sometimes a dataframe will contain duplicate rows where all the variables have exactly the same values in two or more rows. Here is a simple example:

dups<-read.table("c:\\temp\\dups.txt",header=T)
dups

   var1  var2  var3  var4
1     1     2     3     1
2     1     2     2     1
3     3     2     1     1
4     4     4     2     1
5     3     2     1     1
6     6     1     2     5
7     1     2     3     2

Note that row number 5 is an exact duplicate of row number 3. To create a dataframe with all the duplicate rows stripped out, use the unique function like this:

unique(dups)

   var1  var2  var3  var4
1     1     2     3     1
2     1     2     2     1
3     3     2     1     1
4     4     4     2     1
6     6     1     2     5
7     1     2     3     2

Notice that the row names in the new dataframe are the same as in the original, so that you can spot that row number 5 was removed by the operation of the function unique.

To view the rows that are duplicates in a dataframe (if any) use the duplicated function:

dups[duplicated(dups),]

    var1  var2  var3  var4
5      3     2     1     1

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

The R Book by

Eliminating Duplicate Rows from a Dataframe

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly