Transformations

Sometimes, there will be some variables in your source data that aren’t quite right. This section explains how to change a variable in a data frame.

Reassigning Variables

One of the most convenient ways to redefine a variable in a data frame is to use the assignment operator. For example, suppose that you wanted to change the type of a variable in the dow30 data frame that we created above. When read.csv imported this data, it interpreted the “Date” field as a character string and converted it to a factor:

> class(dow30$Date)
[1] "factor"

Factors are fine for some things, but we could better represent the date field as a Date object. (That would create a proper ordering on dates and allow us to extract information from them.) Luckily, Yahoo! Finance prints dates in the default date format for R, so we can just transform these values into Date objects using as.Date (see the help file for as.Date for more information). So let’s change this variable within the data frame to use Date objects:

> dow30$Date <- as.Date(dow30$Date)
> class(dow30$Date)
[1] "Date"

It’s also possible to make other changes to data frames. For example, suppose that we wanted to define a new midpoint variable that is the mean of the high and low price. We can add this variable with the same notation:

> dow30$mid <- (dow30$High + dow30$Low) / 2
> names(dow30)
[1] "symbol"    "Date"      "Open"      "High"      "Low"
[6] "Close"     "Volume"    "Adj.Close" "mid"

The Transform Function

A convenient function for changing variables ...

Get R in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.