Cover by Joseph Adler

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

Combining Data Sets

Let’s start with one of the most common obstacles to data analysis: working with data that’s stored in two different places. For example, suppose that you wanted to look at batting statistics for baseball players by age. In most baseball data sources (like the Baseball Databank data), player information (like ages) is kept in different files from performance data (like batting statistics). So you would need to combine two files to do this analysis. This section discusses several tools in R used for combining data sets.

Pasting Together Data Structures

R provides several functions that allow you to paste together multiple data structures into a single structure.

Paste

The simplest of these functions is paste. The paste function allows you to concatenate multiple character vectors into a single vector. (If you concatenate a vector of another type, it will be coerced to a character vector first.)

> x <- c("a", "b", "c", "d", "e")
> y <- c("A", "B", "C", "D", "E")
> paste(x,y)
[1] "a A" "b B" "c C" "d D" "e E"

By default, values are separated by a space; you can specify another separator (or none at all) with the sep argument:

> paste(x, y, sep="-")
[1] "a-A" "b-B" "c-C" "d-D" "e-E"

If you would like all of values in the returned vector to be concatenated with one another (to return just a single value), then specify a value for the collapse argument. The value of collapse will be used as the separator in this value:

> paste(x, y, sep="-", collapse="#") [1] "a-A#b-B#c-C#d-D#e-E" ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required