Classification trees with categorical explanatory variables
Tree models are a superb tool for helping to write efficient and effective taxonomic keys.
Suppose that all of our explanatory variables are categorical, and that we want to use tree models to write a dichotomous key. There is only one entry for each species, so we want the twigs of the tree to be the individual rows of the dataframe (i.e. we want to fit a tree perfectly to the data). To do this we need to specify two extra arguments: minsize = 2 and mindev = 0. In practice, it is better to specify a very small value for the minimum deviance (say, 10−6) rather than zero (see below).
The following example relates to the nine lowland British species in the genus Epilobium (Onagraceae). We have eight categorical explanatory variables and we want to find the optimal dichotomous key. The dataframe looks like this:
epilobium<-read.table("c:\\temp\\epilobium.txt",header=T) attach(epilobium) epilobium species stigma stem.hairs glandular.hairs seeds pappilose 1 hirsutum lobed spreading absent none uniform 2 parviflorum lobed spreading absent none uniform 3 montanum lobed spreading present none uniform 4 lanceolatum lobed spreading present none uniform 5 tetragonum clavate appressed present none uniform
6 obscurum clavate appressed present none uniform 7 roseum clavate spreading present none uniform 8 palustre clavate spreading present appendage uniform 9 ciliatum clavate spreading present appendage ridged stolons petals base 1 ...
Get The R Book now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.