O'Reilly logo

Clean Data by Megan Squire

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summary

In this chapter, we used a sample dataset, a collection of tweets called Sentiment140, to learn how to clean and manipulate data in a relational database management system. We performed a few basic cleaning procedures in Excel, and then we reviewed how to get the data out of a CSV file and into the database. At this point, the rest of the cleaning procedures were performed inside the RDBMS itself. We learned how to manipulate strings into proper dates, and then we worked on extracting three kinds of data from within the tweet text, ultimately moving these extracted values to new, clean tables. Next, we learned how to create a lookup table of values that are currently stored inefficiently, thus allowing us to update the original table with ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required