Cover image for R in a Nutshell

Book description

R is one of the best tools available for data visualization and statistical computing, and this book is simply the best way to learn this open source language and environment. Practical and easy to read, R in a Nutshell demonstrates why R is increasingly popular for analyzing moderate-to-large data sets. Most books available on R are stiff and academic, but this Nutshell guide offers a readable overview of the language, and contains a reference for the most commonly used features.

  • Learn the basics of the R language, such as syntax, expressions, and more

  • Analyze statistics in R using statistical tests, modeling functions, and charts

  • Discover R's graphical capabilities, including basic R graphics and lattice graphics

Scientists, researchers, and students in a variety of disciplines -- from biology, chemistry, and physics to social sciences, engineering, and webinformatics for social networks, performance analysis, and more -- can perform complicated statistical analysis in minutes that would take hours with Excel. R in a Nutshell shows you how.

Table of Contents

  1. R in a Nutshell
  2. A Note Regarding Supplemental Files
  3. Preface
    1. Why I Wrote This Book
    2. When Should You Use R?
    3. R License Terms
    4. Examples
    5. How This Book Is Organized
    6. Conventions Used in This Book
    7. Using Code Examples
    8. How to Contact Us
    9. Safari® Books Online
    10. Acknowledgments
  4. I. R Basics
    1. 1. Getting and Installing R
      1. R Versions
      2. Getting and Installing Interactive R Binaries
        1. Windows
        2. Mac OS X
        3. Linux and Unix Systems
          1. Installation using package management systems
          2. Installing R from downloaded files
    2. 2. The R User Interface
      1. The R Graphical User Interface
        1. Windows
        2. Mac OS X
        3. Linux and Unix
      2. The R Console
        1. Command-Line Editing
      3. Batch Mode
      4. Using R Inside Microsoft Excel
      5. Other Ways to Run R
    3. 3. A Short R Tutorial
      1. Basic Operations in R
      2. Functions
      3. Variables
      4. Introduction to Data Structures
      5. Objects and Classes
      6. Models and Formulas
      7. Charts and Graphics
      8. Getting Help
    4. 4. R Packages
      1. An Overview of Packages
      2. Listing Packages in Local Libraries
      3. Loading Packages
        1. Loading Packages on Windows and Linux
        2. Loading Packages on Mac OS X
      4. Exploring Package Repositories
        1. Exploring Packages on the Web
        2. Finding and Installing Packages Inside R
          1. Windows and Linux GUIs
          2. Mac OS X GUI
          3. R console
          4. Installing from the command line
      5. Custom Packages
        1. Creating a Package Directory
        2. Building the Package
  5. II. The R Language
    1. 5. An Overview of the R Language
      1. Expressions
      2. Objects
      3. Symbols
      4. Functions
      5. Objects Are Copied in Assignment Statements
      6. Everything in R Is an Object
      7. Special Values
        1. NA
        2. Inf and -Inf
        3. NaN
        4. NULL
      8. Coercion
      9. The R Interpreter
      10. Seeing How R Works
    2. 6. R Syntax
      1. Constants
        1. Numeric Vectors
        2. Character Vectors
        3. Symbols
      2. Operators
        1. Order of Operations
        2. Assignments
      3. Expressions
        1. Separating Expressions
        2. Parentheses
        3. Curly Braces
      4. Control Structures
        1. Conditional Statements
        2. Loops
      5. Accessing Data Structures
        1. Data Structure Operators
        2. Indexing by Integer Vector
        3. Indexing by Logical Vector
        4. Indexing by Name
      6. R Code Style Standards
    3. 7. R Objects
      1. Primitive Object Types
      2. Vectors
      3. Lists
      4. Other Objects
        1. Matrices
        2. Arrays
        3. Factors
        4. Data Frames
        5. Formulas
        6. Time Series
        7. Shingles
        8. Dates and Times
        9. Connections
      5. Attributes
        1. Class
    4. 8. Symbols and Environments
      1. Symbols
      2. Working with Environments
      3. The Global Environment
      4. Environments and Functions
        1. Working with the Call Stack
        2. Evaluating Functions in Different Environments
        3. Adding Objects to an Environment
      5. Exceptions
        1. Signaling Errors
        2. Catching Errors
    5. 9. Functions
      1. The Function Keyword
      2. Arguments
      3. Return Values
      4. Functions As Arguments
        1. Anonymous Functions
        2. Properties of Functions
      5. Argument Order and Named Arguments
      6. Side Effects
        1. Changes to Other Environments
        2. Input/Output
        3. Graphics
    6. 10. Object-Oriented Programming
      1. Overview of Object-Oriented Programming in R
        1. Key Ideas
        2. Implementation Example
      2. Object-Oriented Programming in R: S4 Classes
        1. Defining Classes
        2. New Objects
        3. Accessing Slots
        4. Working with Objects
        5. Creating Coercion Methods
        6. Methods
        7. Managing Methods
        8. Basic Classes
        9. More Help
      3. Old-School OOP in R: S3
        1. S3 Classes
        2. S3 Methods
        3. Using S3 Classes in S4 Classes
        4. Finding Hidden S3 Methods
    7. 11. High-Performance R
      1. Use Built-in Math Functions
      2. Use Environments for Lookup Tables
      3. Use a Database to Query Large Data Sets
      4. Preallocate Memory
      5. Monitor How Much Memory You Are Using
        1. Monitoring Memory Usage
        2. Increasing Memory Limits
        3. Cleaning Up Objects
      6. Functions for Big Data Sets
      7. Parallel Computation with R
      8. High-Performance R Binaries
        1. Revolution R
        2. Building Your Own
          1. Building on Microsoft Windows
          2. Building R on Unix-like systems
          3. Building R on Mac OS X
  6. III. Working with Data
    1. 12. Saving, Loading, and Editing Data
      1. Entering Data Within R
        1. Entering Data Using R Commands
        2. Using the Edit GUI
          1. Windows Data Editor
          2. Mac OS X Data Editor
          3. X Windows (Linux) Data Editor
      2. Saving and Loading R Objects
        1. Saving Objects with save
      3. Importing Data from External Files
        1. Text Files
          1. Delimited files
          2. Fixed-width files
          3. Other functions to parse data
        2. Other Software
      4. Exporting Data
      5. Importing Data from Databases
        1. Export Then Import
        2. Database Connection Packages
        3. RODBC
          1. Getting RODBC working
            1. Installing the RODBC package
            2. Installing ODBC drivers
            3. Example: SQLite ODBC on Mac OS X
            4. Example: SQLite ODBC on Windows
          2. Using RODBC
            1. Opening a channel
            2. Getting information about the database
            3. Getting data
            4. Closing a channel
        4. DBI
          1. Opening a connection
          2. Getting DB information
          3. Querying the database
          4. Cleaning up
        5. TSDBI
    2. 13. Preparing Data
      1. Combining Data Sets
        1. Pasting Together Data Structures
          1. Paste
          2. rbind and cbind
          3. An extended example
        2. Merging Data by Common Fields
      2. Transformations
        1. Reassigning Variables
        2. The Transform Function
        3. Applying a Function to Each Element of an Object
          1. Applying a function to an array
          2. Applying a function to a list or vector
      3. Binning Data
        1. Shingles
        2. Cut
        3. Combining Objects with a Grouping Variable
      4. Subsets
        1. Bracket Notation
        2. subset Function
        3. Random Sampling
      5. Summarizing Functions
        1. tapply, aggregate
        2. Aggregating Tables with rowsum
        3. Counting Values
        4. Reshaping Data
          1. Transposing matrices and data frames
          2. Reshaping data frames and matrices
      6. Data Cleaning
      7. Finding and Removing Duplicates
      8. Sorting
    3. 14. Graphics
      1. An Overview of R Graphics
        1. Scatter Plots
        2. Plotting Time Series
        3. Bar Charts
        4. Pie Charts
        5. Plotting Categorical Data
        6. Three-Dimensional Data
        7. Plotting Distributions
        8. Box Plots
      2. Graphics Devices
      3. Customizing Charts
        1. Common Arguments to Chart Functions
        2. Graphical Parameters
          1. Annotation
          2. Margins
          3. Multiple plots
          4. Text properties
            1. Text size
            2. Typeface
            3. Alignment and spacing
            4. Rotation
          5. Line properties
          6. Colors
          7. Axes
          8. Points
          9. Graphical parameter by name
        3. Basic Graphics Functions
          1. points
          2. lines
          3. curve
          4. text
          5. abline
          6. polygon
          7. segments
          8. legend
          9. title
          10. axis
          11. box
          12. mtext
          13. trans3d
    4. 15. Lattice Graphics
      1. History
      2. An Overview of the Lattice Package
        1. How Lattice Works
        2. A Simple Example
        3. Using Lattice Functions
        4. Custom Panel Functions
      3. High-Level Lattice Plotting Functions
        1. Univariate Trellis Plots
          1. Bar charts
          2. Dot plots
          3. Histograms
          4. Density plots
          5. Strip plots
          6. Univariate quantile-quantile plots
        2. Bivariate Trellis Plots
          1. Scatter plots
          2. Box plots in lattice
          3. Scatter plots matrices
          4. Bivariate quantile-quantile plots
        3. Trivariate Plots
          1. Level plots
          2. Contour plots
          3. Cloud plots
          4. Wire-frame plots
        4. Other Plots
      4. Customizing Lattice Graphics
        1. Common Arguments to Lattice Functions
        2. trellis.skeleton
        3. Controlling How Axes Are Drawn
        4. Parameters
        5. plot.trellis
        6. strip.default
        7. simpleKey
      5. Low-Level Functions
        1. Low-Level Graphics Functions
        2. Panel Functions
  7. IV. Statistics with R
    1. 16. Analyzing Data
      1. Summary Statistics
      2. Correlation and Covariance
      3. Principal Components Analysis
      4. Factor Analysis
      5. Bootstrap Resampling
    2. 17. Probability Distributions
      1. Normal Distribution
      2. Common Distribution-Type Arguments
      3. Distribution Function Families
    3. 18. Statistical Tests
      1. Continuous Data
        1. Normal Distribution-Based Tests
          1. Comparing means
          2. Comparing paired data
          3. Comparing variances of two populations
          4. Comparing means across more than two groups
          5. Pairwise t-tests between multiple groups
          6. Testing for normality
          7. Testing if a data vector came from an arbitrary distribution
          8. Testing if two data vectors came from the same distribution
          9. Correlation tests
        2. Non-Parametric Tests
          1. Comparing two means
          2. Comparing more than two means
          3. Comparing variances
          4. Difference in scale parameters
      2. Discrete Data
        1. Proportion Tests
        2. Binomial Tests
        3. Tabular Data Tests
        4. Non-Parametric Tabular Data Tests
    4. 19. Power Tests
      1. Experimental Design Example
      2. t-Test Design
      3. Proportion Test Design
      4. ANOVA Test Design
    5. 20. Regression Models
      1. Example: A Simple Linear Model
        1. Fitting a Model
        2. Helper Functions for Specifying the Model
        3. Getting Information About a Model
          1. Viewing the model
          2. Predicting values using a model
          3. Analyzing the fit
        4. Refining the Model
      2. Details About the lm Function
        1. Assumptions of Least Squares Regression
        2. Robust and Resistant Regression
          1. Resistant regression
          2. Robust regression
          3. Comparing lm, lqs, and rlm
      3. Subset Selection and Shrinkage Methods
        1. Stepwise Variable Selection
        2. Ridge Regression
        3. Lasso and Least Angle Regression
        4. Principal Components Regression and Partial Least Squares Regression
      4. Nonlinear Models
        1. Generalized Linear Models
        2. Nonlinear Least Squares
      5. Survival Models
      6. Smoothing
        1. Splines
        2. Fitting Polynomial Surfaces
        3. Kernel Smoothing
      7. Machine Learning Algorithms for Regression
        1. Regression Tree Models
          1. Recursive partitioning trees
          2. Patient rule induction method
          3. Bagging for regression
          4. Boosting for regression
          5. Random forests for regression
        2. MARS
        3. Neural Networks
        4. Project Pursuit Regression
        5. Generalized Additive Models
        6. Support Vector Machines
    6. 21. Classification Models
      1. Linear Classification Models
        1. Logistic Regression
        2. Linear Discriminant Analysis
        3. Log-Linear Models
      2. Machine Learning Algorithms for Classification
        1. k Nearest Neighbors
        2. Classification Tree Models
          1. Bagging
          2. Boosting
        3. Neural Networks
        4. SVMs
        5. Random Forests
    7. 22. Machine Learning
      1. Market Basket Analysis
      2. Clustering
        1. Distance Measures
        2. Clustering Algorithms
    8. 23. Time Series Analysis
      1. Autocorrelation Functions
      2. Time Series Models
    9. 24. Bioconductor
      1. An Example
        1. Loading Raw Expression Data
        2. Loading Data from GEO
        3. Matching Phenotype Data
        4. Analyzing Expression Data
      2. Key Bioconductor Packages
      3. Data Structures
        1. eSet
        2. AssayData
        3. AnnotatedDataFrame
        4. MIAME
        5. Other Classes Used by Bioconductor Packages
      4. Where to Go Next
        1. Resources Outside Bioconductor
        2. Vignettes
        3. Courses
        4. Books
  8. A. R Reference
    1. base
      1. Functions
      2. Data Sets
    2. boot
      1. Functions
      2. Data Sets
    3. class
      1. Functions
    4. cluster
      1. Functions
      2. Data Sets
    5. codetools
    6. foreign
      1. Functions
    7. grDevices
      1. Functions
      2. Data Sets
    8. graphics
      1. Functions
    9. grid
    10. KernSmooth
      1. Functions
    11. lattice
      1. Functions
      2. Data Sets
    12. MASS
      1. Functions
      2. Data Sets
    13. methods
      1. Functions
    14. mgcv
    15. nlme
    16. nnet
      1. Functions
    17. rpart
      1. Functions
      2. Data Sets
    18. spatial
      1. Functions
    19. splines
      1. Functions
    20. stats
      1. Functions
      2. Data Set
    21. stats4
      1. Functions
    22. survival
      1. Functions
      2. Data Sets
    23. tcltk
    24. tools
      1. Functions
      2. Data Sets
    25. utils
      1. Functions
  9. Bibliography
  10. Index
  11. About the Author
  12. Colophon
  13. Copyright