You are previewing Efficient R Programming.
O'Reilly logo
Efficient R Programming

Book Description

There are many excellent R resources for visualization, data science, and package development. Hundreds of scattered vignettes, web pages, and forums explain how to use R in particular domains. But little has been written on how to simply make R work effectively—until now. This hands-on book teaches novices and experienced R users how to write efficient R code. Drawing on years of experience teaching R courses, authors Colin Gillespie and Robin Lovelace provide practical advice on a range of topics—from optimizing the set-up of RStudio to leveraging C++—that make this book a useful addition to any R user’s bookshelf.

Table of Contents

  1. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Safari
    4. How to Contact Us
    5. Acknowledgments
      1. Colin
      2. Robin
  2. 1. Introduction
    1. Prerequisites
    2. Who This Book Is for and How to Use It
    3. What Is Efficiency?
    4. What Is Efficient R Programming?
    5. Why Efficiency?
    6. Cross-Transferable Skills for Efficiency
      1. Touch Typing
      2. Consistent Style and Code Conventions
    7. Benchmarking and Profiling
      1. Benchmarking
      2. Benchmarking Example
      3. Profiling
    8. Book Resources
      1. R Package
      2. Online Version
    9. References
  3. 2. Efficient Setup
    1. Prerequisites
    2. Top Five Tips for an Efficient R Setup
    3. Operating System
      1. Operating System and Resource Monitoring
    4. R Version
      1. Installing R
      2. Updating R
      3. Installing R Packages
      4. Installing R Packages with Dependencies
      5. Updating R Packages
    5. R Startup
      1. R Startup Arguments
      2. An Overview of R’s Startup Files
      3. The Location of Startup Files
      4. The .Rprofile File
      5. Example .Rprofile File
      6. The .Renviron File
    6. RStudio
      1. Installing and Updating RStudio
      2. Window Pane Layout
      3. RStudio Options
      4. Autocompletion
      5. Keyboard Shortcuts
      6. Object Display and Output Table
      7. Project Management
    7. BLAS and Alternative R Interpreters
      1. Testing Performance Gains from BLAS
      2. Other Interpreters
      3. Useful BLAS/Benchmarking Resources
    8. References
  4. 3. Efficient Programming
    1. Prerequisites
    2. Top Five Tips for Efficient Programming
    3. General Advice
      1. Memory Allocation
      2. Vectorized Code
    4. Communicating with the User
      1. Fatal Errors: stop()
      2. Warnings: warning()
      3. Informative Output: message() and cat()
      4. Invisible Returns
    5. Factors
      1. Inherent Order
      2. Fixed Set of Categories
    6. The Apply Family
      1. Example: Movies Dataset
      2. Type Consistency
    7. Caching Variables
      1. Function Closures
    8. The Byte Compiler
      1. Example: The Mean Function
      2. Compiling Code
    9. References
  5. 4. Efficient Workflow
    1. Prerequisites
    2. Top Five Tips for Efficient Workflow
    3. A Project Planning Typology
    4. Project Planning and Management
      1. Chunking Your Work
      2. Making Your Workflow SMART
      3. Visualizing Plans with R
    5. Package Selection
      1. Searching for R Packages
      2. How to Select a Package
    6. Publication
      1. Dynamic Documents with R Markdown
      2. R Packages
    7. Reference
  6. 5. Efficient Input/Output
    1. Prerequisites
    2. Top Five Tips for Efficient Data I/O
    3. Versatile Data Import with rio
    4. Plain-Text Formats
      1. Differences Between fread() and read_csv()
      2. Preprocessing Text Outside R
    5. Binary File Formats
      1. Native Binary Formats: Rdata or Rds?
      2. The Feather File Format
      3. Benchmarking Binary File Formats
      4. Protocol Buffers
    6. Getting Data from the Internet
    7. Accessing Data Stored in Packages
    8. References
  7. 6. Efficient Data Carpentry
    1. Prerequisites
    2. Top Five Tips for Efficient Data Carpentry
    3. Efficient Data Frames with tibble
    4. Tidying Data with tidyr and Regular Expressions
      1. Make Wide Tables Long with gather()
      2. Split Joint Variables with separate()
      3. Other tidyr Functions
      4. Regular Expressions
    5. Efficient Data Processing with dplyr
      1. Renaming Columns
      2. Changing Column Classes
      3. Filtering Rows
      4. Chaining Operations
      5. Data Aggregation
      6. Nonstandard Evaluation
    6. Combining Datasets
    7. Working with Databases
      1. Databases and dplyr
    8. Data Processing with data.table
    9. References
  8. 7. Efficient Optimization
    1. Prerequisites
    2. Top Five Tips for Efficient Optimization
    3. Code Profiling
      1. Getting Started with profvis
      2. Example: Monopoly Simulation
    4. Efficient Base R
      1. The if() Versus ifelse() Functions
      2. Sorting and Ordering
      3. Reversing Elements
      4. Which Indices are TRUE?
      5. Converting Factors to Numerics
      6. Logical AND and OR
      7. Row and Column Operations
      8. is.na() and anyNA()
      9. Matrices
    5. Example: Optimizing the move_square() Function
    6. Parallel Computing
      1. Parallel Versions of Apply Functions
      2. Example: Snakes and Ladders
      3. Exit Functions with Care
      4. Parallel Code under Linux and OS X
    7. Rcpp
      1. A Simple C++ Function
      2. The cppFunction() Command
      3. C++ Data Types
      4. The sourceCpp() Function
      5. Vectors and Loops
      6. Matrices
      7. C++ with Sugar on Top
      8. Rcpp Resources
    8. References
  9. 8. Efficient Hardware
    1. Prerequisites
    2. Top Five Tips for Efficient Hardware
    3. Background: What Is a Byte?
    4. Random Access Memory
    5. Hard Drives: HDD Versus SSD
    6. Operating Systems: 32-Bit or 64-Bit
    7. Central Processing Unit
    8. Cloud Computing
      1. Amazon EC2
  10. 9. Efficient Collaboration
    1. Prerequisites
    2. Top Five Tips for Efficient Collaboration
    3. Coding Style
      1. Reformatting Code with RStudio
      2. Filenames
      3. Loading Packages
      4. Commenting
      5. Object Names
      6. Example Package
      7. Assignment
      8. Spacing
      9. Indentation
      10. Curly Braces
    4. Version Control
      1. Commits
      2. Git Integration in RStudio
      3. GitHub
      4. Branches, Forks, Pulls, and Clones
    5. Code Review
    6. References
  11. 10. Efficient Learning
    1. Prerequisties
    2. Top Five Tips for Efficient Learning
    3. Using R’s Internal Help
      1. Searching R for Topics
      2. Finding and Using Vignettes
      3. Getting Help on Functions
      4. Reading R Source Code
      5. swirl
    4. Online Resources
      1. Stack Overflow
      2. Mailing Lists and Groups
    5. Asking a Question
      1. Minimal Dataset
      2. Minimal Example
    6. Learning In Depth
    7. Spread the Knowledge
    8. References
  12. A. Package Dependencies
  13. B. References
  14. Index