Subscripts and Indices

The key thing about working effectively with dataframes is to become completely at ease with using subscripts (or indices, as some people call them). In R, subscripts appear in square brackets []. A dataframe is a two-dimensional object, comprising rows and columns. The rows are referred to by the first (left-hand) subscript, the columns by the second (right-hand) subscript. Thus

worms[3,5]

[1]  4.3

is the value of Soil.pH (the variable in column 5) in row 3. To extract a range of values (say the 14th to 19th rows) from worm density (the variable in the seventh column) we use the colon operator : to generate a series of subscripts (14, 15, 16, 17, 18 and 19):

worms[14:19,7]

[1] 0 6 8 4 5 1

To extract a group of rows and a group of columns, you need to generate a series of subscripts for both the row and column subscripts. Suppose we want Area and Slope (columns 2 and 3) from rows 1 to 5:

worms[1:5,2:3]

     Area   Slope
1     3.6      11
2     5.1       2
3     2.8       3
4     2.4       5
5     3.8       0

This next point is very important, and is hard to grasp without practice. To select all the entries in a row the syntax is ‘number comma blank’. Similarly, to select all the entries in a column the syntax is ‘blank comma number’. Thus, to select all the columns in row 3

worms[3,]

       Field.Name  Area   Slope   Vegetation   Soil.pH    Damp   Worm.density
3   Nursery.Field   2.8       3    Grassland       4.3   FALSE              2

whereas to select all of the rows in column number 3 we enter

worms[,3] [1] 11 2 3 5 0 2 3 0 0 4 10 1 2 6 0 0 8 2 1 1 0 ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.