Merging Two Dataframes
Suppose we have two dataframes, the first containing information on plant life forms and the second containing information of time of flowering. We want to produce a single dataframe showing information on both life form and flowering time. Both dataframes contain variables for genus name and species name:
(lifeforms<-read.table("c:\\temp\\lifeforms.txt",header=T)) Genus species lifeform 1 Acer platanoides tree 2 Acer palmatum tree 3 Ajuga reptans herb 4 Conyza sumatrensis annual 5 Lamium album herb (flowering<-read.table("c:\\temp\\fltimes.txt",header=T)) Genus species flowering 1 Acer platanoides May 2 Ajuga reptans June 3 Brassica napus April 4 Chamerion angustifolium July 5 Conyza bilbaoana August 6 Lamium album January
Because at least one of the variable names is identical in the two dataframes (in this case, two variables are identical, namely Genus and species) we can use the simplest of all merge commands:
merge(flowering,lifeforms)
Genus species flowering lifeform
1 Acer platanoides May tree
2 Ajuga reptans June herb
3 Lamium album January herb
The important point to note is that the merged dataframe contains only those rows which had complete entries in both dataframes. Two rows from the lifeforms database were excluded because there were no flowering time data for them (Acer platanoides and Conyza sumatrensis), and three rows from the flowering-time database were excluded because there were no lifeform data for them (Chamerion angustifolium, ...
Get The R Book now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.