Eliminating duplicated rows with dplyr
To avoid counting duplicate rows, we can use the distinct
operation in SQL. In dplyr
, we can also eliminate duplicated rows from a given dataset.
Getting ready
Ensure that you completed the Enhancing a data.frame with a data.table recipe to load purchase_view.tab
and purchase_order.tab
as both data.frame
and data.table
into your R environment.
How to do it…
Perform the following steps to distinct duplicate rows with dplyr
:
- First, we illustrate how to obtain unique products from the dataset:
> order.dt %>% select(Product) %>% distinct() %>% head(3) Product 1: P0006944501 2: P0006018073 3: P0002267974
- We can also
distinct
duplicated rows containing multiple columns:> distinct.product.user.dt <- order.dt %>% select(Product, ...
Get R for Data Science Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.