Eliminating duplicated rows with dplyr

To avoid counting duplicate rows, we can use the distinct operation in SQL. In dplyr, we can also eliminate duplicated rows from a given dataset.

Getting ready

Ensure that you completed the Enhancing a data.frame with a data.table recipe to load purchase_view.tab and purchase_order.tab as both data.frame and data.table into your R environment.

How to do it…

Perform the following steps to distinct duplicate rows with dplyr:

First, we illustrate how to obtain unique products from the dataset:

> order.dt %>% select(Product) %>% distinct() %>% head(3)
       Product
1: P0006944501
2: P0006018073
3: P0002267974

We can also distinct duplicated rows containing multiple columns:
```
> distinct.product.user.dt <- order.dt %>% select(Product, ...
```

Get R for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

R for Data Science Cookbook by Yu-Wei Chiu - David Chiu

Eliminating duplicated rows with dplyr

Getting ready

How to do it…

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly