Other messiness

As we discussed in this chapter's preface, there are countless ways that a dataset may be messy. There are many other messy situations and solutions that we couldn't discuss at length here. In order that you, dear reader, are not left in the dark regarding custodial solutions, here are some other remedies which you may find helpful along your analytics journey:

OpenRefine

Though OpenRefine (formerly Google Refine) doesn't have anything to do with R per se, it is a sophisticated tool for working with and for cleaning up messy data. Among its numerous, sophisticated capabilities is the capacity to auto-detect misspelled or mispecified categories and fix them at the click of a button.

Regular expressions

Suppose you find that there

Get R: Data Analysis and Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.