A first look at data

Before passing our data to regression algorithms, we need to give a first look at what we've imported into the R environment to see if there are any issues. Often, raw data is messy and poorly formatted. In other cases, it may not have the appropriate details for our study.

Correcting the data in progress can be destructive because it can be overwritten without the ability to restore the original data.

To get started, it's good practice to keep your original data. To do this, every change will be performed on a copy of the dataset. Putting order in the data is the first step and it will make data cleaning more easily, but let's ask a question. When can we say that our data is tidy? According to Hadley Wickham, a dataset ...

Get Regression Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.