O'Reilly logo

Clean Data by Megan Squire

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

An introductory example

To get started, let's sharpen our chef's knife with a small example that integrates the six-step framework and illustrates how to tackle a few simple cleaning issues. This example uses the publicly available Enron e-mail dataset. This is a very famous dataset consisting of e-mail messages sent to, from, and between employees working at the now-defunct Enron Corporation. As part of the U.S. Government investigation into accounting fraud at Enron, the e-mails became part of the public record and are now downloadable by anyone. Researchers in a variety of domains have found the e-mails helpful for studying workplace communication, social networks, and more.

Note

You can read more about Enron and the financial scandal that led ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required