- R in a Nutshell
- Preface
- I. R Basics
- II. The R Language
- III. Working with Data
- IV. Data Visualization
- V. Statistics with R
- VI. Additional Topics
- A. R Reference
- Bibliography
- Index
- About the Author
- Colophon
- Copyright

Often, you’ll be provided with too much data. For example, suppose that you were working with patient records at a hospital. You might want to analyze healthcare records for patients between 5 and 13 years of age who were treated for asthma during the past 3 years. To do this, you need to take a subset of the data and not examine the whole database.

Other times, you might have too much relevant data. For example, suppose that you were looking at a logistics operation that fills billions of orders every year. R can hold only a certain number of records in memory and might not be able to hold the entire database. In most cases, you can get statistically significant results with a tiny fraction of the data; even millions of orders might be too many.

One way to take a subset of a data set is to use the bracket notation. As you may recall, you can select rows in a data frame by providing a vector of logical values. If you can write a simple expression describing the set of rows to select from a data frame, you can provide this as an index.

For example, suppose that we wanted to select only batting data
from 2008. The column `batting.w.names$yearID`

contains the year
associated with each row, so we could calculate a vector of logical
values describing which rows to keep with the expression `batting.w.names$yearID==2008`

. Now we just have
to index the data frame `batting.w.names`

with this vector to select
only rows for the year 2008:

`> `**batting.w.names.2008 <- batting.w.names[batting.w.names$yearID==2008,] ...**