O'Reilly logo

Using OpenRefine by Max De Wilde, Ruben Verborgh

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Linking Datasets

Your dataset is not an island. Somewhere, related datasets exist, even in places where you might not expect them. For instance, if your dataset has a Country of Origin column, then it is related to a geographical database that lists the total area per country. An Author column in a book dataset relates to a list of authors with biographical data. All datasets have such connections, yet you might not know about them, and neither does the computer which contains your dataset. For instance, the record for The Picture of Dorian Gray might list Wilde, O . as its author, whereas a biographical dataset might only have an entry for Oscar Wilde. Even though they point to the same person, the string values are different, and ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required