Chapter 13. Cleaning Data: Impose order

image with no caption

Your data is useless...

...if it has messy structure. And a lot of people who collect data do a crummy job of maintaining a neat structure. If your data’s not neat, you can’t slice it or dice it, run formulas on it, or even really see it. You might as well just ignore it completely, right? Actually, you can do better. With a clear vision of how you need it to look and a few text manipulation tools, you can take the funkiest, craziest mess of data and whip it into something useful.

Just got a client list from a defunct competitor

Your newest client, Head First Head Hunters, just received a list of job seekers from a defunct competitor. They had to spend big bucks to get it, but it’s hugely valuable. The people on this list are the best of the best, the most employable people around.

This list could be a gold mine...

image with no caption

...too bad the data is a mess! In its current form, there’s not much they can do with this data. That’s why they called you. Can you help?

The dirty secret of data analysis

The dirty secret of data analysis is that as analyst you might spend more time cleaning data than analyzing it. Data often doesn’t arrive perfectly organized, so you’ll have to do some heavy text manipulation ...

Get Head First Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.