O'Reilly logo

Using OpenRefine by Max De Wilde, Ruben Verborgh

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Recipe 6 – removing matching rows

In this recipe, you will learn how to suppress problematic rows that have been previously singled out through the use of facets and filters.

Detecting duplicates or flagging redundant rows is fine, but it is only part of the job. At some point, you will want to cross the mark between data profiling (or analysis) and data cleaning. In practice, this means that rows that have been identified as inappropriate during the diagnosis phase (and probably flagged as such) will need to be removed from the dataset, since they are detrimental to its quality.

To remove rows, be sure to have a facet or filter in place first, otherwise you will remove all rows in the dataset. Let's start from the clean project again (import it ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required