Removing duplicate cases

We sometimes end up with duplicate cases in our datasets and want to retain only one among the duplicates.

Getting ready

Create a sample data frame:

> salary <- c(20000, 30000, 25000, 40000, 30000, 34000, 30000)
> family.size <- c(4,3,2,2,3,4,3)
> car <- c("Luxury", "Compact", "Midsize", "Luxury", "Compact", "Compact", "Compact")
> prospect <- data.frame(salary, family.size, car)

How to do it...

The unique() function can do the job. It takes a vector or data frame as an argument and returns an object of the same type as its argument but with duplicates removed.

Get unique values:

> prospect.cleaned <- unique(prospect)
> nrow(prospect)
[1] 7
> nrow(prospect.cleaned)
[1] 5

How it works...

The unique() function takes a vector or data ...

Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.