You are previewing R in a Nutshell, 2nd Edition.

R in a Nutshell, 2nd Edition

Cover of R in a Nutshell, 2nd Edition by Joseph Adler Published by O'Reilly Media, Inc.
  1. R in a Nutshell
  2. Preface
    1. Why I Wrote This Book
    2. When Should You Use R?
    3. What’s New in the Second Edition?
    4. R License Terms
    5. Examples
    6. How This Book Is Organized
    7. Conventions Used in This Book
    8. Using Code Examples
    9. Safari® Books Online
    10. How to Contact Us
    11. Acknowledgments
  3. I. R Basics
    1. 1. Getting and Installing R
      1. R Versions
      2. Getting and Installing Interactive R Binaries
    2. 2. The R User Interface
      1. The R Graphical User Interface
      2. The R Console
      3. Batch Mode
      4. Using R Inside Microsoft Excel
      5. RStudio
      6. Other Ways to Run R
    3. 3. A Short R Tutorial
      1. Basic Operations in R
      2. Functions
      3. Variables
      4. Introduction to Data Structures
      5. Objects and Classes
      6. Models and Formulas
      7. Charts and Graphics
      8. Getting Help
    4. 4. R Packages
      1. An Overview of Packages
      2. Listing Packages in Local Libraries
      3. Loading Packages
      4. Exploring Package Repositories
      5. Installing Packages From Other Repositories
      6. Custom Packages
  4. II. The R Language
    1. 5. An Overview of the R Language
      1. Expressions
      2. Objects
      3. Symbols
      4. Functions
      5. Objects Are Copied in Assignment Statements
      6. Everything in R Is an Object
      7. Special Values
      8. Coercion
      9. The R Interpreter
      10. Seeing How R Works
    2. 6. R Syntax
      1. Constants
      2. Operators
      3. Expressions
      4. Control Structures
      5. Accessing Data Structures
      6. R Code Style Standards
    3. 7. R Objects
      1. Primitive Object Types
      2. Vectors
      3. Lists
      4. Other Objects
      5. Attributes
    4. 8. Symbols and Environments
      1. Symbols
      2. Working with Environments
      3. The Global Environment
      4. Environments and Functions
      5. Exceptions
    5. 9. Functions
      1. The Function Keyword
      2. Arguments
      3. Return Values
      4. Functions as Arguments
      5. Argument Order and Named Arguments
      6. Side Effects
    6. 10. Object-Oriented Programming
      1. Overview of Object-Oriented Programming in R
      2. Object-Oriented Programming in R: S4 Classes
      3. Old-School OOP in R: S3
  5. III. Working with Data
    1. 11. Saving, Loading, and Editing Data
      1. Entering Data Within R
      2. Saving and Loading R Objects
      3. Importing Data from External Files
      4. Exporting Data
      5. Importing Data From Databases
      6. Getting Data from Hadoop
    2. 12. Preparing Data
      1. Combining Data Sets
      2. Transformations
      3. Binning Data
      4. Subsets
      5. Summarizing Functions
      6. Data Cleaning
      7. Finding and Removing Duplicates
      8. Sorting
  6. IV. Data Visualization
    1. 13. Graphics
      1. An Overview of R Graphics
      2. Graphics Devices
      3. Customizing Charts
    2. 14. Lattice Graphics
      1. History
      2. An Overview of the Lattice Package
      3. High-Level Lattice Plotting Functions
      4. Customizing Lattice Graphics
      5. Low-Level Functions
    3. 15. ggplot2
      1. A Short Introduction
      2. The Grammar of Graphics
      3. A More Complex Example: Medicare Data
      4. Quick Plot
      5. Creating Graphics with ggplot2
      6. Learning More
  7. V. Statistics with R
    1. 16. Analyzing Data
      1. Summary Statistics
      2. Correlation and Covariance
      3. Principal Components Analysis
      4. Factor Analysis
      5. Bootstrap Resampling
    2. 17. Probability Distributions
      1. Normal Distribution
      2. Common Distribution-Type Arguments
      3. Distribution Function Families
    3. 18. Statistical Tests
      1. Continuous Data
      2. Discrete Data
    4. 19. Power Tests
      1. Experimental Design Example
      2. t-Test Design
      3. Proportion Test Design
      4. ANOVA Test Design
    5. 20. Regression Models
      1. Example: A Simple Linear Model
      2. Details About the lm Function
      3. Subset Selection and Shrinkage Methods
      4. Nonlinear Models
      5. Survival Models
      6. Smoothing
      7. Machine Learning Algorithms for Regression
    6. 21. Classification Models
      1. Linear Classification Models
      2. Machine Learning Algorithms for Classification
    7. 22. Machine Learning
      1. Market Basket Analysis
      2. Clustering
    8. 23. Time Series Analysis
      1. Autocorrelation Functions
      2. Time Series Models
  8. VI. Additional Topics
    1. 24. Optimizing R Programs
      1. Measuring R Program Performance
      2. Optimizing Your R Code
      3. Other Ways to Speed Up R
    2. 25. Bioconductor
      1. An Example
      2. Key Bioconductor Packages
      3. Data Structures
      4. Where to Go Next
    3. 26. R and Hadoop
      1. R and Hadoop
      2. Other Packages for Parallel Computation with R
      3. Where to Learn More
  9. A. R Reference
    1. base
      1. Functions
      2. Data Sets
    2. boot
      1. Functions
      2. Data Sets
    3. class
      1. Functions
    4. cluster
      1. Functions
      2. Data Sets
    5. codetools
    6. foreign
      1. Functions
    7. grDevices
      1. Functions
      2. Data Sets
    8. graphics
      1. Functions
    9. grid
    10. KernSmooth
      1. Functions
    11. lattice
      1. Functions
      2. Data Sets
    12. MASS
      1. Functions
      2. Data Sets
    13. methods
      1. Functions
    14. mgcv
    15. nlme
    16. nnet
      1. Functions
    17. rpart
      1. Functions
      2. Data Sets
    18. spatial
      1. Functions
    19. splines
      1. Functions
    20. stats
      1. Functions
      2. Data Set
    21. stats4
      1. Functions
    22. survival
      1. Functions
      2. Data Sets
    23. tcltk
    24. tools
      1. Functions
      2. Data Sets
    25. utils
      1. Functions
  10. Bibliography
  11. Index
  12. About the Author
  13. Colophon
  14. Copyright
O'Reilly logo

Optimizing Your R Code

Once you figure out where your program is spending its time, you can focus on improving those areas. This section describes some common causes for poor performance and shows how to resolve them.

Using Vector Operations

R is a functional language with built-in support for vector operations. Whenever possible, you should use vector operations in your code and not write iterative algorithms. This section explains why.

Iterative algorithms and vector operations

Let’s consider a simple problem: calculating a vector with the square of every integer between 1 and n. Consider the following naive implementation:

> naive.vector.of.squares <- function(n) {
+   v <- 1:n
+   for (i in 1:n)
+     v[i] <- v[i]^2
+   v
+ }
> naive.vector.of.squares(10)
 [1]   1   4   9  16  25  36  49  64  81 100

How does the performance of this function vary with n? Let’s do a quick experiment:

> # 10,000 values
> system.time(naive.vector.of.squares(10000))
   user  system elapsed
  0.037   0.000   0.037
> # 10,000,000 values
> system.time(naive.vector.of.squares(10000000))
   user  system elapsed
 30.211   0.233  30.178

As you can see, the time required to compute the vector varies linearly with the size of the vector (n). This makes sense: R is looping through all n elements in the vector and changing each element one at a time. (Note that R doesn’t actually copy the vector v repeatedly inside the loop; see Objects Are Copied in Assignment Statements for more about how this works.)

It turns out that there is a much better way to implement ...

The best content for your career. Discover unlimited learning on demand for around $1/day.