The sample Function

This function shuffles the contents of a vector into a random sequence while maintaining all the numerical values intact. It is extremely useful for randomization in experimental design, in simulation and in computationally intensive hypothesis testing. Here is the original y vector again:

y
[1]  8  3  5  7  6  6  8  9  2  3  9  4  10  4  11

and here are two samples of y:

sample(y)

[1]  8  8  9  9  2  10  6  7  3  11  5  4  6  3  4

sample(y)

[1]  9  3  9  8  8  6  5  11  4  6  4  7  3  2  10

The order of the values is different each time that sample is invoked, but the same numbers are shuffled in every case. This is called sampling without replacement. You can specify the size of the sample you want as an optional second argument:

sample(y,5)

[1]  9  4  10  8  11

sample(y,5)

[1]  9  3  4  2  8

The option replace=T allows for sampling with replacement, which is the basis of bootstrapping (see p. 320). The vector produced by the sample function with replace=T is the same length as the vector sampled, but some values are left out at random and other values, again at random, appear two or more times. In this sample, 10 has been left out, and there are now three 9s:

sample(y,replace=T)

[1]  9  6  11  2  9  4  6  8  8  4  4  4  3  9  3

In this next case, the are two 10s and only one 9:

sample(y,replace=T)

[1]  3  7  10  6  8  2  5  11  4  6  3  9  10  7  4

More advanced options in sample include specifying different probabilities with which each element is to be sampled (prob=). For example, if we want to take four numbers at random from ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.