Chapter 2. Case Study: Data Cleaning

Now that we know how to start RStudio, let’s dive in. We’ll begin with a blow-by-blow account of a sample data analysis for which we read in some data, clean it up, then format it for further study. We deliberately chose an example that will take us on some detours, as the point of the exercise is to show how many of RStudio’s features can be used during the process to speed the task along. We will postpone for now an example of the “development” aspect of RStudio.

The data set we look at here comes from a colleague, and contains records from a psychology experiment on a colony of naked mole rats. The experimenter is interested in both the behavior of each naked mole rat in time and the social aspect of the colony as a whole.

Each rat wears an RFID chip that allows the researcher to track its motion. The experiment consists of 15 chambers (bubbles) in a linear arrangement separated by 14 tubes. Each tube has a gate with a sensor. When a mole rat passes through the tube, the time and gate are recorded. Unfortunately, gates can be missed, and the recording device can erroneously replicate values, so the raw data must be cleaned up.

This data comes to us in rich-text format (rtf). This quasi text-based format is a bit unusual for data transfer but presumably is used by the recording apparatus. We will see that this format has some idiosyncrasies that will require us to work a little harder than we might normally do to read data into an RStudio session, ...

Get Getting Started with RStudio now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.