R is a language and environment for statistical computing and graphics, but it’s even more than that. R is a mature open source software project with support from many developers, an interpreted functional language, and an extensible system for data analysis. A large community of contributors has written libraries of functions for R, called packages.
I like to use R to examine baseball statistics because R is very intuitive. A fan can easily calculate formulas without doing any programming. For example, calculating the earned run average (ERA) for a few hundred pitchers is as easy as typing ERA <- ER/IP.
We will use R for a few tasks that are difficult (almost impossible) to perform with just a relational database or a spreadsheet, such as building statistical models and creating sophisticated plots and graphs (one of R’s key strengths).
You can download R executables from the project’s main web site, http://www.r-project.org. You probably should pick a web site that’s kind of close to you, but it doesn’t actually matter all that much.
The mirror sites offer precompiled binaries for Windows, Mac OS X, and Linux, and they include source code. I recommend downloading the precompiled binaries if you can; modifying the R source code is beyond the scope of this book. There’s really no point in walking you through the installation on Windows. As with MySQL, R has a very slick installation wizard that will walk you through each step.
The binaries come in a standard install program (for Windows), a disk image (for Mac OS X), an RPM file (for Red Hat Linux), and forms for other versions of Linux. I haven’t tested the Linux installations and can’t vouch for them, but I bet they work just as well, given how slick the Windows and Mac versions are.
Once the installation is complete, start R. You’ll see a command-line window with a > prompt, similar to a shell. (See Figure 4-2 for an illustration.) Try typing demo(graphics) and pressing Return to start a quick demonstration. R will prompt you to “Hit <Return> to see the next plot,” so press your Enter/Return key to cycle through the demo.
In addition to the base program, R has a number of other packages that we’ll use throughout this book. Similar to the modules in Perl, these packages provide enhanced functionality. For example, the lattice package [Hack #35] allows you to plot multiple graphs on the same plot area easily. Other more specialized packages are available for scientific, financial, and economics applications.
R makes it really easy to find and use these packages. Go to the Package menu in R and select “Install package(s) from CRAN…” (on Mac OS X, the menu name is R Package Installer). You will see a dialog box like the one shown in Figure 4-3. Select the items that you want from the list and click the OK button.
Sometimes you might that find it’s easier to type a command to install a package. To do this in R, use the
install.packages() command, which requires a vector with the package names that you want to install. Here is an example that installs the R Commander package (see “Analyze Baseball with R”
for more on this package):
Before using packages in R, you need to load them. You can do this through the GUI or by using command-line options. To load packages into R, use the
library() command with the package name as an argument. For example, the following command loads the Lattice graphics library: