Eliminating Duplicate Rows from a Dataframe
Sometimes a dataframe will contain duplicate rows where all the variables have exactly the same values in two or more rows. Here is a simple example:
dups<-read.table("c:\\temp\\dups.txt",header=T) dups var1 var2 var3 var4 1 1 2 3 1 2 1 2 2 1 3 3 2 1 1 4 4 4 2 1 5 3 2 1 1 6 6 1 2 5 7 1 2 3 2
Note that row number 5 is an exact duplicate of row number 3. To create a dataframe with all the duplicate rows stripped out, use the unique function like this:
unique(dups)
var1 var2 var3 var4
1 1 2 3 1
2 1 2 2 1
3 3 2 1 1
4 4 4 2 1
6 6 1 2 5
7 1 2 3 2
Notice that the row names in the new dataframe are the same as in the original, so that you can spot that row number 5 was removed by the operation of the function unique.
To view the rows that are duplicates in a dataframe (if any) use the duplicated function:
dups[duplicated(dups),]
var1 var2 var3 var4
5 3 2 1 1
Get The R Book now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.