This chapter sets the groundwork for the other chapters. It explains how to download, install, and run R.
When you install R on your computer, a mass of documentation is also installed. You can browse the local documentation (Recipe 1.6) and search it (Recipe 1.8). I am amazed how often I search the Web for an answer only to discover it was already available in the installed documentation.
A task view describes packages that are specific to one area of statistical work, such as econometrics, medical imaging, psychometrics, or spatial statistics. Each task view is written and maintained by an expert in the field. There are 28 such task views, so there is likely to be one or more for your areas of interest. I recommend that every beginner find and read at least one task view in order to gain a sense of R’s possibilities (Recipe 1.11).
Most packages include useful documentation. Many also include overviews and tutorials, called vignettes in the R community. The documentation is kept with the packages in package repositories, such as CRAN, and it is automatically installed on your machine when you install a package.
Volunteers have generously donated many hours of time to answer beginners’ questions that are posted to the R mailing lists. The lists are archived, so you can search the archives for answers to your questions (Recipe 1.12).
On a Q&A site, anyone can post a question, and knowledgeable people can respond. Readers vote on the answers, so the best answers tend to emerge over time. All this information is tagged and archived for searching. These sites are a cross between a mailing list and a social network; the Stack Overflow site is a good example.
The Web is loaded with information about R, and there are R-specific tools for searching it (Recipe 1.10). The Web is a moving target, so be on the lookout for new, improved ways to organize and search information regarding R.
Open http://www.r-project.org/ in your browser.
Click on “CRAN”. You’ll see a list of mirror sites, organized by country.
Select a site near you.
Click on “Windows” under “Download and Install R”.
Click on “base”.
Click on the link for downloading the latest version of R (an .exe file).
When the download completes, double-click on the .exe file and answer the usual questions.
Open http://www.r-project.org/ in your browser.
Click on “CRAN”. You’ll see a list of mirror sites, organized by country.
Select a site near you.
Click on “MacOS X”.
Click on the .pkg file for the latest version of R, under “Files:”, to download it.
When the download completes, double-click on the .pkg file and answer the usual questions.
|Ubuntu or Debian||r-base|
|Red Hat or Fedora||R.i386|
Use the system’s package manager to download and install the
package. Normally, you will need the root password or
sudo privileges; otherwise, ask a system
administrator to perform the installation.
Installing R on Windows or OS X is straightforward because there are prebuilt binaries for those platforms. You need only follow the preceding instructions. The CRAN Web pages also contain links to installation-related resources, such as frequently asked questions (FAQs) and tips for special situations (“How do I install R when using Windows Vista?”) that you may find useful.
Theoretically, you can install R on Linux or Unix in one of two ways: by installing a distribution package or by building it from scratch. In practice, installing a package is the preferred route. The distribution packages greatly streamline both the initial installation and subsequent updates.
sudo apt-get install r-base
sudo yum install R.i386
Beyond the base packages, I recommend installing the documentation packages, too. On my Ubuntu machine, for
example, I installed
r-base-html (because I like
browsing the hyperlinked documentation) as well as
r-doc-html, which installs the important R manuals
sudo apt-get install r-base-html r-doc-html
Some Linux repositories also include prebuilt copies of R packages available on CRAN. I don’t use them because I’d rather get my software directly from CRAN itself, which usually has the freshest versions.
In rare cases, you may need to build R from scratch. You might have an obscure, unsupported version of Unix; or you might have special considerations regarding performance or configuration. The build procedure on Linux or Unix is quite standard. Download the tarball from the home page of your CRAN mirror; it’s called something like R-2.12.1.tar.gz, except the “2.12.1” will be replaced by the latest version. Unpack the tarball, look for a file called INSTALL, and follow the directions.
R in a Nutshell (O’Reilly) contains more details of downloading and installing R, including instructions for building the Windows and OS X versions. Perhaps the ultimate guide is the one entitled R Installation and Administration, available on CRAN, which describes building and installing R on a variety of platforms.
This recipe is about installing the base package. See Recipe 3.9 for installing add-on packages from CRAN.
Click on Start → All Programs → R; or double-click on the R icon on your desktop (assuming the installer created an icon for you).
When you start R, it opens a new window. The window includes a text pane, called the R Console, where you enter R expressions (see Figure 1-1).
There is an odd thing about the Windows Start menu for R. Every time you upgrade to a new version of R, the Start menu expands to contain the new version while keeping all the previously installed versions. So if you’ve upgraded, you may face several choices such as “R 2.8.1”, “R 2.9.1”, “R 2.10.1”, and so forth. Pick the newest one. (You might also consider uninstalling the older versions to reduce the clutter.)
The installer may have created a desktop icon. If not, creating a shortcut is easy: follow the Start menu to the R program, but instead of left-clicking to run R, press and hold your mouse’s right button on the program name, drag the program name to your desktop, and release the mouse button. Windows will ask if you want to Copy Here or Move Here. Select Copy Here, and the shortcut will appear on your desktop.
Another way to start R is by double-clicking on a
.RData file in your working directory. This is the
file that R creates to save your workspace. The first time you create a
directory, start R and change to that directory. Save your workspace
there, either by exiting or using the
save.image function. That will create
the .RData file. Thereafter, you can simply open
the directory in Windows Explorer and then double-click on the
.RData file to start R.
If you start R from the Start menu, the working directory is
normally either C:\Documents
and Settings\<username>\My Documents
(Windows XP) or
Vista, Windows 7). You can override this default by setting the
R_USER environment variable to an alternative
If you start R from a desktop shortcut, you can specify an alternative startup directory that becomes the working directory when R is started. To specify the alternative directory, right-click on the shortcut, select Properties, enter the directory path in the box labeled “Start in”, and click OK.
Starting R by double-clicking on your .RData file is the most straightforward solution to this little problem. R will automatically change its working directory to be the file’s directory, which is usually what you want.
In any event, you can always use the
function to discover your current working directory (Recipe 3.1).
Just for the record, Windows also has a console version of R called Rterm.exe. You’ll find it in the bin subdirectory of your R installation. It is much less convenient than the graphic user interface (GUI) version, and I never use it. I recommend it only for batch (noninteractive) usage such as running jobs from the Windows scheduler. In this book, I assume you are running the GUI version of R, not the console version.
Run R by clicking the R icon in the Applications folder. (If you use R frequently, you can drag it from the folder to the dock.) That will run the GUI version, which is somewhat more convenient than the console version. The GUI version displays your working directory, which is initially your home directory.
OS X also lets you run the console version of R by typing
R at the shell prompt.
Simply enter expressions at the command prompt. R will evaluate them and print (display) the result. You can use command-line editing to facilitate typing.
The computer adds one and one, giving two, and displays the result.
 before the
might be confusing. To R, the result is a vector, even though it has
only one element. R labels the value with
signify that this is the first element of the vector...which is not
surprising, since it’s the only element of the
R will prompt you for input until you type a complete expression.
max(1,3,5) is a complete expression,
so R stops reading input and evaluates what it’s got:
In contrast, “
max(1,3,” is an incomplete
expression, so R prompts you for more input. The prompt changes from
>) to plus (
letting you know that R expects more:
It’s easy to mistype commands, and retyping them is tedious and frustrating. So R includes command-line editing to make life easier. It defines single keystrokes that let you easily recall, correct, and reexecute your commands. My own typical command-line interaction goes like this:
I enter an R expression with a typo.
R complains about my mistake.
I press the up-arrow key to recall my mistaken line.
I use the left and right arrow keys to move the cursor back to the error.
I use the Delete key to delete the offending characters.
I type the corrected characters, which inserts them into the command line.
I press Enter to reexecute the corrected command.
That’s just the basics. R supports the usual keystrokes for recalling and editing command lines, as listed in Table 1-1.
Table 1-1. Keystrokes for command-line editing
|Labeled key||Ctrl-key combination||Effect|
|Up arrow||Ctrl-P||Recall previous command by moving backward through the history of commands.|
|Down arrow||Ctrl-N||Move forward through the history of commands.|
|Backspace||Ctrl-H||Delete the character to the left of cursor.|
|Delete (Del)||Ctrl-D||Delete the character to the right of cursor.|
|Home||Ctrl-A||Move cursor to the start of the line.|
|End||Ctrl-E||Move cursor to the end of the line.|
|Right arrow||Ctrl-F||Move cursor right (forward) one character.|
|Left arrow||Ctrl-B||Move cursor left (back) one character.|
|Ctrl-K||Delete everything from the cursor position to the end of the line.|
|Ctrl-U||Clear the whole darn line and start over.|
|Tab||Name completion (on some platforms).|
See Recipe 2.13. From the Windows main menu, follow Help → Console for a complete list of keystrokes useful for command-line editing.
Note the empty parentheses, which are necessary to call the function.
Save your workspace and exit.
Don’t save your workspace, but exit anyway.
Cancel, returning to the command prompt rather than exiting.
If you save your workspace, then R writes it to a file called
.RData in the current working
directory. This will overwrite the previously saved workspace, if any,
so don’t save if you don’t like the changes to your workspace (e.g., if
you have accidentally erased critical data).
You want to interrupt a long-running computation and return to the command prompt without exiting R.
Interrupting R can leave your variables in an indeterminate state, depending upon how far the computation had progressed. Check your workspace after interrupting.
See Recipe 1.4.
From there, links are available to all the installed documentation.
The base distribution of R includes a wealth of documentation—literally thousands of pages. When you install additional packages, those packages contain documentation that is also installed on your machine.
It is easy to browse this documentation via the
help.start function, which opens a window on the
top-level table of contents; see Figure 1-2.
The two links in the Reference section are especially useful:
Click here to access a simple search engine, which allows you to search the documentation by keyword or phrase. There is also a list of common keywords, organized by topic; click one to see the associated pages.
The local documentation is copied from the R Project website, which may have updated documents.
help to display the documentation for the
args for a quick reminder of the function
example to see examples of using the
I present many R functions in this book. Every R function has more bells and whistles than I can possibly describe. If a function catches your interest, I strongly suggest reading the help page for that function. One of its bells or whistles might be very useful to you.
This will either open a window with function documentation or
display the documentation on your console, depending upon your platform.
A shortcut for the
help command is to simply type
? followed by the function name:
args(mean)function (x, ...) NULL >
args(sd)function (x, na.rm = FALSE) NULL
The first line of output from
args is a
synopsis of the function call. For
mean, the synopsis
shows one argument,
x, which is a vector of numbers.
sd, the synopsis shows the same vector,
x, and an optional argument called
na.rm. (You can ignore the second line of output,
which is often just
Most documentation for functions includes examples near
the end. A cool feature of R is that you can request that it execute the
examples, giving you a little demonstration of the function’s
capabilities. The documentation for the
function, for instance, contains examples, but you don’t need to type
them yourself. Just use the
example function to watch them
example(mean)mean> x <- c(0:10, 50) mean> xm <- mean(x) mean> c(xm, mean(x, trim = 0.1))  8.75 5.50 mean> mean(USArrests, trim = 0.2) Murder Assault UrbanPop Rape 7.42 167.60 66.20 20.16
The user typed
example(mean). Everything else
was produced by R, which executed the examples from the help page and
displayed the results.
Alternatively, you want to search the installed documentation for a keyword.
You may occasionally request help on a function only to be told R knows nothing about it:
help(adf.test)No documentation for 'adf.test' in specified packages and libraries: you could try 'help.search("adf.test")'
This can be frustrating if you know the function is installed on your machine. Here the problem is that the function’s package is not currently loaded, and you don’t know which package contains the function. It’s a kind of catch-22 (the error message indicates the package is not currently in your search path, so R cannot find the help file; see Recipe 3.5 for more details).
The search will produce a listing of all packages that contain the function:
Help files with alias or concept or title matching 'adf.test' using regular expression matching: tseries::adf.test Augmented Dickey-Fuller Test Type '?PKG::FOO' to inspect entry 'PKG::FOO TITLE'.
The following output, for example, indicates that the
tseries package contains the
adf.test function. You can see its documentation by
help which package contains the
Alternatively, you can insert the
package into your search list and repeat the original
command, which will then find the function and display the documentation.
You can broaden your search by using keywords. R will then find any installed documentation that contains the keywords. Suppose you want to find all functions that mention the Augmented Dickey–Fuller (ADF) test. You could search on a likely pattern:
On my machine, the result looks like this because I’ve installed
two additional packages (
urca) that implement the ADF test:
Help files with alias or concept or title matching 'dickey-fuller' using fuzzy matching: fUnitRoots::DickeyFullerPValues Dickey-Fuller p Values tseries::adf.test Augmented Dickey-Fuller Test urca::ur.df Augmented-Dickey-Fuller Unit Root Test Type '?PKG::FOO' to inspect entry 'PKG::FOO TITLE'.
help function and specify a package
name (without a function name):
Sometimes you want to know the contents of a package (the functions and datasets). This is especially true after you download and install a new package, for example. The help function can provide the contents plus other information once you specify the package name.
This call to help will display the information for the
tseries package, a standard package in the base
The information begins with a description and continues with an index of functions and datasets. On my machine, the first few lines look like this:
Information on package 'tseries' Description: Package: tseries Version: 0.10-22 Date: 2009-11-22 Title: Time series analysis and computational finance Author: Compiled by Adrian Trapletti <firstname.lastname@example.org> Maintainer: Kurt Hornik <Kurt.Hornik@R-project.org> Description: Package for time series analysis and computational finance Depends: R (>= 2.4.0), quadprog, stats, zoo Suggests: its Imports: graphics, stats, utils License: GPL-2 Packaged: 2009-11-22 19:03:45 UTC; hornik Repository: CRAN Date/Publication: 2009-11-22 19:06:50 Built: R 2.10.0; i386-pc-mingw32; 2009-12-01 19:32:47 UTC; windows Index: NelPlo Nelson-Plosser Macroeconomic Time Series USeconomic U.S. Economic Variables adf.test Augmented Dickey-Fuller Test arma Fit ARMA Models to Time Series . . (etc.) .
Some packages also include vignettes, which are additional documents such as introductions, tutorials, or reference cards. They are installed on your computer as part of the package documentation when you install the package. The help page for a package includes a list of its vignettes near the bottom.
You can see the vignettes for a particular package by including its name:
Each vignette has a name, which you use to view the vignette:
See Recipe 1.7 for getting help on a particular function in a package.
Inside your browser, try using these sites for searching:
RSiteSearch function will open a browser
window and direct it to the search engine on the R Project website. There you
will see an initial search that you can refine. For example, this call
would start a search for “canonical correlation”:
This is quite handy for doing quick web searches without leaving R. However, the search scope is limited to R documentation and the mailing-list archives.
The rseek.org site provides a wider search. Its virtue is that it harnesses the power of the Google search engine while focusing on sites relevant to R. That eliminates the extraneous results of a generic Google search. The beauty of rseek.org is that it organizes the results in a useful way.
Figure 1-3 shows the results of visiting rseek.org and searching for “canonical correlation”. The left side of the page shows general results for search R sites. The right side is a tabbed display that organizes the search results into several categories:
If you click on the Introductions tab, for example, you’ll find tutorial material. The Task Views tab will show any Task View that mentions your search term. Likewise, clicking on Functions will show links to relevant R functions. This is a good way to zero in on search results.
Stack Overflow is a so-called Q&A site, which means that anyone can submit a question and experienced users will supply answers—often there are multiple answers to each question. Readers vote on the answers, so good answers tend to rise to the top. This creates a rich database of Q&A dialogs, which you can search. Stack Overflow is strongly problem oriented, and the topics lean toward the programming side of R.
Stack Overflow hosts questions for many programming languages; therefore, when entering a term into their search box, prefix it with “[r]” to focus the search on questions tagged for R. For example, searching via “[r] standard error” will select only the questions tagged for R and will avoid the Python and C++ questions.
Stack Exchange (not Overflow) has a Q&A area for Statistical Analysis. The area is more focused on statistics than programming, so use this site when seeking answers that are more concerned with statistics in general and less with R in particular.
If your search reveals a useful package, use Recipe 3.9 to install it on your machine.
Visit the list of task views at http://cran.r-project.org/web/views/. Find and read the task view for your area, which will give you links to and descriptions of relevant packages. Or visit http://rseek.org, search by keyword, click on the Task Views tab, and select an applicable task view.
Visit crantastic and search for packages by keyword.
To find relevant functions, visit http://rseek.org, search by name or keyword, and click on the Functions tab.
This problem is especially vexing for beginners. You think R can solve your problems, but you have no idea which packages and functions would be useful. A common question on the mailing lists is: “Is there a package to solve problem X?” That is the silent scream of someone drowning in R.
As of this writing, there are more than 2,000 packages available for free download from CRAN. Each package has a summary page with a short description and links to the package documentation. Once you’ve located a potentially interesting package, you would typically click on the “Reference manual” link to view the PDF documentation with full details. (The summary page also contains download links for installing the package, but you’ll rarely install the package that way; see Recipe 3.9.)
Sometimes you simply have a generic interest—such as Bayesian analysis, econometrics, optimization, or graphics. CRAN contains a set of task view pages describing packages that may be useful. A task view is a great place to start since you get an overview of what’s available. You can see the list of task view pages at http://cran.r-project.org/web/views/ or search for them as described in the Solution.
Suppose you happen to know the name of a useful package—say, by seeing it mentioned online. A complete, alphabetical list of packages is available at http://cran.r-project.org/web/packages/ with links to the package summary pages.
You can download and install an R package called
sos that provides powerful other ways
to search for packages; see the vignette at http://cran.r-project.org/web/packages/sos/vignettes/sos.pdf.
Open http://rseek.org in your browser. Search for a keyword or other search term from your question. When the search results appear, click on the “Support Lists” tab.
The initial search results will appear in a browser. Under “Target”, select the R-help sources, clear the other sources, and resubmit your query.
This recipe is really just an application of Recipe 1.10. But it’s an important application because you should search the mailing list archives before submitting a new question to the list. Your question has probably been answered before.
CRAN has a list of additional resources for searching the Web; see http://cran.r-project.org/search.html.
The Mailing Lists page contains general information and instructions for using the R-help mailing list. Here is the general process:
Subscribe to the R-help list at the Main R Mailing List.
Read the Posting Guide for instructions on writing an effective submission.
Write your question carefully and correctly. If appropriate, include a minimal self-reproducing example so that others can reproduce your error or problem.
Mail your question to email@example.com.
The R mailing list is a powerful resource, but please treat it as a last resort. Read the help pages, read the documentation, search the help list archives, and search the Web. It is most likely that your question has already been answered. Don’t kid yourself: very few questions are unique.
After writing your question, submitting it is easy. Just mail it to firstname.lastname@example.org. You must be a list subscriber, however; otherwise your email submission may be rejected.
Construct the smallest snippet of R code that displays your problem. Remove everything that is irrelevant.
Include the data necessary to exactly reproduce the error.
If the list readers can’t reproduce it, they can’t diagnose it.
For complicated data structures, use the
function to create an ASCII representation of your data and include it in
Including an example clarifies your question and greatly increases the probability of getting a useful answer.
There are actually several mailing lists. R-help is the main list for general questions. There are also many special interest group (SIG) mailing lists dedicated to particular domains such as genetics, finance, R development, and even R jobs. You can see the full list at https://stat.ethz.ch/mailman/listinfo. If your question is specific to one such domain, you’ll get a better answer by selecting the appropriate list. As with R-help, however, carefully search the SIG list archives before submitting your question.
An excellent essay by Eric Raymond and Rick Moen is entitled “How to Ask Questions the Smart Way”. I suggest that you read it before submitting any question.