The match Function

The match function answers the question ‘Where do the values in the second vector appear in the first vector?’. This is impossible to understand without an example:

first<-c(5,8,3,5,3,6,4,4,2,8,8,8,4,4,6)
second<-c(8,6,4,2)
match(first,second)

[1]  NA  1  NA  NA  NA  2  3  3  4  1  1  1  3  3  2

The first thing to note is that match produces a vector of subscripts (index values) and that these are subscripts within the second vector. The length of the vector produced by match is the length of the first vector (15 in this example). If elements of the first vector do not occur anywhere in the second vector, then match produces NA.

Why would you ever want to use this? Suppose you wanted to give drug A to all the patients in the first vector that were identified in the second vector, and drug B to all the others (i.e. those identified by NA in the output of match, above, because they did not appear in the second vector). You create a vector called drug with two elements (A and B), then select the appropriate drug on the basis of whether or not match(first,second) is NA:

drug<-c("A","B")
drug[1+is.na(match(first,second))]

[1] "B"  "A"  "B"  "B"  "B"  "A"  "A"  "A"  "A"  "A"  "A"  "A"  "A"      "A" "A"

The match function can also be very powerful in manipulating dataframes to mimic the functionality of a relational database (p. 127).

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.