The Art of R Programming

Book description

R is the world's most popular language for developing statistical software: Archaeologists use it to track the spread of ancient civilizations, drug companies use it to discover which medications are safe and effective, and actuaries use it to assess financial risks and keep economies running smoothly.

The Art of R Programming takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions. No statistical knowledge is required, and your programming skills can range from hobbyist to pro.

Along the way, you'll learn about functional and object-oriented programming, running mathematical simulations, and rearranging complex data into simpler, more useful formats. You'll also learn to:

•Create artful graphs to visualize complex data sets and functions
•Write more efficient code using parallel R and vectorization
•Interface R with C/C++ and Python for increased speed or functionality
•Find new R packages for text analysis, image manipulation, and more
•Squash annoying bugs with advanced debugging techniques

Whether you're designing aircraft, forecasting the weather, or you just need to tame your data, The Art of R Programming is your guide to harnessing the power of statistical computing.

Publisher resources

View/Submit Errata

Table of contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. BRIEF CONTENTS
  5. CONTENTS IN DETAIL
  6. Acknowledgments
  7. Introduction
    1. Why Use R for Your Statistical Work?
      1. Object-Oriented Programming
      2. Functional Programming
    2. Whom Is This Book For?
    3. My Own Background
  8. 1. Getting Started
    1. How to Run R
      1. Interactive Mode
      2. Batch Mode
    2. A First R Session
    3. Introduction to Functions
      1. Variable Scope
      2. Default Arguments
    4. Preview of Some Important R Data Structures
      1. Vectors, the R Workhorse
        1. Scalars
      2. Character Strings
      3. Matrices
      4. Lists
      5. Data Frames
      6. Classes
    5. Extended Example: Regression Analysis of Exam Grades
    6. Startup and Shutdown
    7. Getting Help
      1. The help() Function
      2. The example() Function
      3. If You Don’t Know Quite What You’re Looking For
      4. Help for Other Topics
      5. Help for Batch Mode
      6. Help on the Internet
  9. 2. Vectors
    1. Scalars, Vectors, Arrays, and Matrices
      1. Adding and Deleting Vector Elements
      2. Obtaining the Length of a Vector
      3. Matrices and Arrays as Vectors
    2. Declarations
    3. Recycling
    4. Common Vector Operations
      1. Vector Arithmetic and Logical Operations
      2. Vector Indexing
      3. Generating Useful Vectors with the : Operator
      4. Generating Vector Sequences with seq()
      5. Repeating Vector Constants with rep()
    5. Using all() and any()
      1. Extended Example: Finding Runs of Consecutive Ones
      2. Extended Example: Predicting Discrete-Valued Time Series
    6. Vectorized Operations
      1. Vector In, Vector Out
      2. Vector In, Matrix Out
    7. NA and NULL Values
      1. Using NA
      2. Using NULL
    8. Filtering
      1. Generating Filtering Indices
      2. Filtering with the subset() Function
      3. The Selection Function which()
    9. A Vectorized if-then-else: The ifelse() Function
      1. Extended Example: A Measure of Association
      2. Extended Example: Recoding an Abalone Data Set
    10. Testing Vector Equality
    11. Vector Element Names
    12. More on c()
  10. 3. Matrices and Arrays
    1. Creating Matrices
    2. General Matrix Operations
      1. Performing Linear Algebra Operations on Matrices
      2. Matrix Indexing
      3. Extended Example: Image Manipulation
      4. Filtering on Matrices
      5. Extended Example: Generating a Covariance Matrix
    3. Applying Functions to Matrix Rows and Columns
      1. Using the apply() Function
      2. Extended Example: Finding Outliers
    4. Adding and Deleting Matrix Rows and Columns
      1. Changing the Size of a Matrix
      2. Extended Example: Finding the Closest Pair of Vertices in a Graph
    5. More on the Vector/Matrix Distinction
    6. Avoiding Unintended Dimension Reduction
    7. Naming Matrix Rows and Columns
    8. Higher-Dimensional Arrays
  11. 4. Lists
    1. Creating Lists
    2. General List Operations
      1. List Indexing
      2. Adding and Deleting List Elements
      3. Getting the Size of a List
      4. Extended Example: Text Concordance
    3. Accessing List Components and Values
    4. Applying Functions to Lists
      1. Using the lapply() and sapply() Functions
      2. Extended Example: Text Concordance, Continued
      3. Extended Example: Back to the Abalone Data
    5. Recursive Lists
  12. 5. Data Frames
    1. Creating Data Frames
      1. Accessing Data Frames
      2. Extended Example: Regression Analysis of Exam Grades Continued
    2. Other Matrix-Like Operations
      1. Extracting Subdata Frames
      2. More on Treatment of NA Values
      3. Using the rbind() and cbind() Functions and Alternatives
      4. Applying apply()
      5. Extended Example: A Salary Study
    3. Merging Data Frames
      1. Extended Example: An Employee Database
    4. Applying Functions to Data Frames
      1. Using lapply() and sapply() on Data Frames
      2. Extended Example: Applying Logistic Regression Models
      3. Extended Example: Aids for Learning Chinese Dialects
  13. 6. Factors and Tables
    1. Factors and Levels
    2. Common Functions Used with Factors
      1. The tapply() Function
      2. The split() Function
      3. The by() Function
    3. Working with Tables
      1. Matrix/Array-Like Operations on Tables
      2. Extended Example: Extracting a Subtable
      3. Extended Example: Finding the Largest Cells in a Table
    4. Other Factor- and Table-Related Functions
      1. The aggregate() Function
      2. The cut() Function
  14. 7. R Programming Structures
    1. Control Statements
      1. Loops
      2. Looping Over Nonvector Sets
      3. if-else
    2. Arithmetic and Boolean Operators and Values
    3. Default Values for Arguments
    4. Return Values
      1. Deciding Whether to Explicitly Call return()
      2. Returning Complex Objects
    5. Functions Are Objects
    6. Environment and Scope Issues
      1. The Top-Level Environment
      2. The Scope Hierarchy
      3. More on ls()
      4. Functions Have (Almost) No Side Effects
      5. Extended Example: A Function to Display the Contents of a Call Frame
    7. No Pointers in R
    8. Writing Upstairs
      1. Writing to Nonlocals with the Superassignment Operator
      2. Writing to Nonlocals with assign()
      3. Extended Example: Discrete-Event Simulation in R
      4. When Should You Use Global Variables?
      5. Closures
    9. Recursion
      1. A Quicksort Implementation
      2. Extended Example: A Binary Search Tree
    10. Replacement Functions
      1. What’s Considered a Replacement Function?
      2. Extended Example: A Self-Bookkeeping Vector Class
    11. Tools for Composing Function Code
      1. Text Editors and Integrated Development Environments
      2. The edit() Function
    12. Writing Your Own Binary Operations
    13. Anonymous Functions
  15. 8. Doing Math and Simulations in R
    1. Math Functions
      1. Extended Example: Calculating a Probability
      2. Cumulative Sums and Products
      3. Minima and Maxima
      4. Calculus
    2. Functions for Statistical Distributions
    3. Sorting
    4. Linear Algebra Operations on Vectors and Matrices
      1. Extended Example: Vector Cross Product
      2. Extended Example: Finding Stationary Distributions of Markovv Chains
    5. Set Operations
    6. Simulation Programming in R
      1. Built-In Random Variate Generators
      2. Obtaining the Same Random Stream in Repeated Runs
      3. Extended Example: A Combinatorial Simulation
  16. 9. Object-Oriented Programming
    1. S3 Classes
      1. S3 Generic Functions
      2. Example: OOP in the lm() Linear Model Function
      3. Finding the Implementations of Generic Methods
      4. Writing S3 Classes
      5. Using Inheritance
      6. Extended Example: A Class for Storing Upper-Triangular Matrices
      7. Extended Example: A Procedure for Polynomial Regression
    2. S4 Classes
      1. Writing S4 Classes
      2. Implementing a Generic Function on an S4 Class
    3. S3 Versus S4
    4. Managing Your Objects
      1. Listing Your Objects with the ls() Function
      2. Removing Specific Objects with the rm() Function
      3. Saving a Collection of Objects with the save() Function
      4. “What Is This?”
      5. The exists() Function
  17. 10. Input/Output
    1. Accessing the Keyboard and Monitor
      1. Using the scan() Function
      2. Using the readline() Function
      3. Printing to the Screen
    2. Reading and Writing Files
      1. Reading a Data Frame or Matrix from a File
      2. Reading Text Files
      3. Introduction to Connections
      4. Extended Example: Reading PUMS Census Files
      5. Accessing Files on Remote Machines via URLs
      6. Writing to a File
      7. Getting File and Directory Information
      8. Extended Example: Sum the Contents of Many Files
    3. Accessing the Internet
      1. Overview of TCP/IP
      2. Sockets in R
      3. Extended Example: Implementing Parallel R
  18. 11. String Manipulation
    1. An Overview of String-Manipulation Functions
      1. grep()
      2. nchar()
      3. paste()
      4. sprintf()
      5. substr()
      6. strsplit()
      7. regexpr()
      8. gregexpr()
    2. Regular Expressions
      1. Extended Example: Testing a Filename for a Given Suffix
      2. Extended Example: Forming Filenames
    3. Use of String Utilities in the edtdbg Debugging Tool
  19. 12. Graphics
    1. Creating Graphs
      1. The Workhorse of R Base Graphics: The plot() Function
      2. Adding Lines: The abline() Function
      3. Starting a New Graph While Keeping the Old Ones
      4. Extended Example: Two Density Estimates on the Same Graph
      5. Extended Example: More on the Polynomial Regression Example
      6. Adding Points: The points() Function
      7. Adding a Legend: The legend() Function
      8. Adding Text: The text() Function
      9. Pinpointing Locations: The locator() Function
      10. Restoring a Plot
    2. Customizing Graphs
      1. Changing Character Sizes: The cex Option
      2. Changing the Range of Axes: The xlim and ylim Options
      3. Adding a Polygon: The polygon() Function
      4. Smoothing Points: The lowess() and loess() Functions
      5. Graphing Explicit Functions
      6. Extended Example: Magnifying a Portion of a Curve
    3. Saving Graphs to Files
      1. R Graphics Devices
      2. Saving the Displayed Graph
      3. Closing an R Graphics Device
    4. Creating Three-Dimensional Plots
  20. 13. Debugging
    1. Fundamental Principles of Debugging
      1. The Essence of Debugging: The Principle of Confirmation
      2. Start Small
      3. Debug in a Modular, Top-Down Manner
      4. Antibugging
    2. Why Use a Debugging Tool?
    3. Using R Debugging Facilities
      1. Single-Stepping with the debug() and browser() Functions
      2. Using Browser Commands
      3. Setting Breakpoints
        1. Calling browser() Directly
        2. Using the setBreakpoint() Function
      4. Tracking with the trace() Function
      5. Performing Checks After a Crash with the traceback() and debugger() Function
      6. Extended Example: Two Full Debugging Sessions
        1. Debugging Finding Runs of Ones
        2. Debugging Finding City Pairs
    4. Moving Up in the World: More Convenient Debugging Tools
    5. Ensuring Consistency in Debugging Simulation Code
    6. Syntax and Runtime Errors
    7. Running GDB on R Itself
  21. 14. Performance Enhancement: Speed and Memory
    1. Writing Fast R Code
    2. The Dreaded for Loop
      1. Vectorization for Speedup
      2. Extended Example: Achieving Better Speed in a Monte Carlo Simulation
      3. Extended Example: Generating a Powers Matrix
    3. Functional Programming and Memory Issues
      1. Vector Assignment Issues
      2. Copy-on-Change Issues
      3. Extended Example: Avoiding Memory Copy
    4. Using Rprof() to Find Slow Spots in Your Code
      1. Monitoring with Rprof()
      2. How Rprof() Works
    5. Byte Code Compilation
    6. Oh No, the Data Doesn’t Fit into Memory!
      1. Chunking
      2. Using R Packages for Memory Management
  22. 15. Interfacing R to Other Languages
    1. Writing C/C++ Functions to Be Called from R
      1. Some R-to-C/C++ Preliminaries
      2. Example: Extracting Subdiagonals from a Square Matrix
      3. Compiling and Running Code
      4. Debugging R/C Code
      5. Extended Example: Prediction of Discrete-Valued Time Series
    2. Using R from Python
      1. Installing RPy
      2. RPy Syntax
  23. 16. Parallel R
    1. The Mutual Outlinks Problem
    2. Introducing the snow Package
      1. Running snow Code
      2. Analyzing the snow Code
      3. How Much Speedup Can Be Attained?
      4. Extended Example: K-Means Clustering
    3. Resorting to C
      1. Using Multicore Machines
      2. Extended Example: Mutual Outlinks Problem in OpenMP
      3. Running the OpenMP Code
      4. OpenMP Code Analysis
      5. Other OpenMP Pragmas
      6. The omp barrier Pragma
      7. The omp critical Pragma
      8. The omp single Pragma
      9. GPU Programming
    4. General Performance Considerations
      1. Sources of Overhead
      2. Shared-Memory Machines
      3. Networked Systems of Computers
      4. Embarrassingly Parallel Applications and Those That Aren’t
      5. Static Versus Dynamic Task Assignment
      6. Software Alchemy: Turning General Problems into Embarrassingly Parallel Ones
    5. Debugging Parallel R Code
  24. A. Installing R
    1. Downloading R from CRAN
    2. Installing from a Linux Package Manager
    3. Installing from Source
  25. B. Installing and Using Packages
    1. Package Basics
    2. Loading a Package from Your Hard Drive
    3. Downloading a Package from the Web
      1. Installing Packages Automatically
      2. Installing Packages Manually
    4. Listing the Functions in a Package
  26. Index
  27. Colophon
  28. About the Author

Product information

  • Title: The Art of R Programming
  • Author(s): Norman Matloff
  • Release date: October 2011
  • Publisher(s): No Starch Press
  • ISBN: 9781593273842