Eliminating duplicated rows with dplyr

To avoid counting duplicate rows, we can use the distinct operation in SQL. In dplyr, we can also eliminate duplicated rows from a given dataset.

Getting ready

Ensure that you completed the Enhancing a data.frame with a data.table recipe to load purchase_view.tab and purchase_order.tab as both data.frame and data.table into your R environment.

How to do it…

Perform the following steps to distinct duplicate rows with dplyr:

  1. First, we illustrate how to obtain unique products from the dataset:
    > order.dt %>% select(Product) %>% distinct() %>% head(3)
           Product
    1: P0006944501
    2: P0006018073
    3: P0002267974
    
  2. We can also distinct duplicated rows containing multiple columns:
    > distinct.product.user.dt <- order.dt %>% select(Product, ...

Get R for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.