Eliminating Duplicate Rows from a Dataframe

Sometimes a dataframe will contain duplicate rows where all the variables have exactly the same values in two or more rows. Here is a simple example:

dups<-read.table("c:\\temp\\dups.txt",header=T)
dups

   var1  var2  var3  var4
1     1     2     3     1
2     1     2     2     1
3     3     2     1     1
4     4     4     2     1
5     3     2     1     1
6     6     1     2     5
7     1     2     3     2

Note that row number 5 is an exact duplicate of row number 3. To create a dataframe with all the duplicate rows stripped out, use the unique function like this:

unique(dups)

   var1  var2  var3  var4
1     1     2     3     1
2     1     2     2     1
3     3     2     1     1
4     4     4     2     1
6     6     1     2     5
7     1     2     3     2

Notice that the row names in the new dataframe are the same as in the original, so that you can spot that row number 5 was removed by the operation of the function unique.

To view the rows that are duplicates in a dataframe (if any) use the duplicated function:

dups[duplicated(dups),]

    var1  var2  var3  var4
5      3     2     1     1

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.