Merging Two Dataframes

Suppose we have two dataframes, the first containing information on plant life forms and the second containing information of time of flowering. We want to produce a single dataframe showing information on both life form and flowering time. Both dataframes contain variables for genus name and species name:

(lifeforms<-read.table("c:\\temp\\lifeforms.txt",header=T))

    Genus      species  lifeform
1    Acer  platanoides      tree
2    Acer     palmatum      tree
3   Ajuga      reptans      herb
4  Conyza  sumatrensis    annual
5  Lamium        album      herb

(flowering<-read.table("c:\\temp\\fltimes.txt",header=T))

        Genus        species      flowering
1        Acer    platanoides            May
2       Ajuga        reptans           June
3    Brassica          napus          April
4   Chamerion  angustifolium           July
5      Conyza      bilbaoana         August
6      Lamium          album        January

Because at least one of the variable names is identical in the two dataframes (in this case, two variables are identical, namely Genus and species) we can use the simplest of all merge commands:

merge(flowering,lifeforms)

    Genus      species  flowering  lifeform
1    Acer  platanoides        May      tree
2   Ajuga      reptans       June      herb
3  Lamium        album    January      herb

The important point to note is that the merged dataframe contains only those rows which had complete entries in both dataframes. Two rows from the lifeforms database were excluded because there were no flowering time data for them (Acer platanoides and Conyza sumatrensis), and three rows from the flowering-time database were excluded because there were no lifeform data for them (Chamerion angustifolium, ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.