Opening and transforming data with OpenRefine

OpenRefine originated as GoogleRefine. Google later open sourced the code. It is a great tool to sift through the data quickly, clean it, remove duplicate rows, analyze distributions or trends over time, and more.

In this and the following recipes, we will deal with the realEstate_trans_dirty.csv file that is located in the Data/Chapter1 folder. The file has several issues that, over the course of the following recipes, we will see how to resolve.

First, when read from a text file, OpenRefine defaults the types of data to text; we will deal with data type transformations in this recipe. Otherwise, we will not be able to use facets to explore the numerical columns. Second, there are duplicates in the ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.