You are previewing The Book of R.
O'Reilly logo
The Book of R

Book Description

The Book of R teaches statistics and programming in R for beginners.

Table of Contents

  1. Cover
  2. Title
  3. Copyright
  4. Brief Contents
    1. A Brief History of R
    2. About This Book
      1. Part I: The Language
      2. Part II: Programming
      3. Part III: Statistics and Probability
      4. Part IV: Statistical Testing and Modeling
      5. Part V: Advanced Graphics
    3. For Students
    4. For Instructors
    1. 1.1 Obtaining and Installing R from CRAN
    2. 1.2 Opening R for the First Time
      1. 1.2.1 Console and Editor Panes
      2. 1.2.2 Comments
      3. 1.2.3 Working Directory
      4. 1.2.4 Installing and Loading R Packages
      5. 1.2.5 Help Files and Function Documentation
      6. 1.2.6 Third-Party Editors
    3. 1.3 Saving Work and Exiting R
      1. 1.3.1 Workspaces
      2. 1.3.2 Scripts
    4. 1.4 Conventions
      1. 1.4.1 Coding
      2. 1.4.2 Math and Equation References
      3. 1.4.3 Exercises
      4. Exercise 1.1
    1. 2.1 R for Basic Math
      1. 2.1.1 Arithmetic
      2. 2.1.2 Logarithms and Exponentials
      3. 2.1.3 E-Notation
      4. Exercise 2.1
    2. 2.2 Assigning Objects
      1. Exercise 2.2
    3. 2.3 Vectors
      1. 2.3.1 Creating a Vector
      2. 2.3.2 Sequences, Repetition, Sorting, and Lengths
      3. Exercise 2.3
      4. 2.3.3 Subsetting and Element Extraction
      5. Exercise 2.4
      6. 2.3.4 Vector-Oriented Behavior
      7. Exercise 2.5
    1. 3.1 Defining a Matrix
      1. 3.1.1 Filling Direction
      2. 3.1.2 Row and Column Bindings
      3. 3.1.3 Matrix Dimensions
    2. 3.2 Subsetting
      1. 3.2.1 Row, Column, and Diagonal Extractions
      2. 3.2.2 Omitting and Overwriting
      3. Exercise 3.1
    3. 3.3 Matrix Operations and Algebra
      1. 3.3.1 Matrix Transpose
      2. 3.3.2 Identity Matrix
      3. 3.3.3 Scalar Multiple of a Matrix
      4. 3.3.4 Matrix Addition and Subtraction
      5. 3.3.5 Matrix Multiplication
      6. 3.3.6 Matrix Inversion
      7. Exercise 3.2
    4. 3.4 Multidimensional Arrays
      1. 3.4.1 Definition
      2. 3.4.2 Subsets, Extractions, and Replacements
      3. Exercise 3.3
    1. 4.1 Logical Values
      1. 4.1.1 TRUE or FALSE?
      2. 4.1.2 A Logical Outcome: Relational Operators
      3. Exercise 4.1
      4. 4.1.3 Multiple Comparisons: Logical Operators
      5. Exercise 4.2
      6. 4.1.4 Logicals Are Numbers!
      7. 4.1.5 Logical Subsetting and Extraction
      8. Exercise 4.3
    2. 4.2 Characters
      1. 4.2.1 Creating a String
      2. 4.2.2 Concatenation
      3. 4.2.3 Escape Sequences
      4. 4.2.4 Substrings and Matching
      5. Exercise 4.4
    3. 4.3 Factors
      1. 4.3.1 Identifying Categories
      2. 4.3.2 Defining and Ordering Levels
      3. 4.3.3 Combining and Cutting
      4. Exercise 4.5
    1. 5.1 Lists of Objects
      1. 5.1.1 Definition and Component Access
      2. 5.1.2 Naming
      3. 5.1.3 Nesting
      4. Exercise 5.1
    2. 5.2 Data Frames
      1. 5.2.1 Construction
      2. 5.2.2 Adding Data Columns and Combining Data Frames
      3. 5.2.3 Logical Record Subsets
      4. Exercise 5.2
    1. 6.1 Some Special Values
      1. 6.1.1 Infinity
      2. 6.1.2 NaN
      3. Exercise 6.1
      4. 6.1.3 NA
      5. 6.1.4 NULL
      6. Exercise 6.2
    2. 6.2 Understanding Types, Classes, and Coercion
      1. 6.2.1 Attributes
      2. 6.2.2 Object Class
      3. 6.2.3 Is-Dot Object-Checking Functions
      4. 6.2.4 As-Dot Coercion Functions
      5. Exercise 6.3
    1. 7.1 Using plot with Coordinate Vectors
    2. 7.2 Graphical Parameters
      1. 7.2.1 Automatic Plot Types
      2. 7.2.2 Title and Axis Labels
      3. 7.2.3 Color
      4. 7.2.4 Line and Point Appearances
      5. 7.2.5 Plotting Region Limits
    3. 7.3 Adding Points, Lines, and Text to an Existing Plot
      1. Exercise 7.1
    4. 7.4 The ggplot2 Package
      1. 7.4.1 A Quick Plot with qplot
      2. 7.4.2 Setting Appearance Constants with Geoms
      3. 7.4.3 Aesthetic Mapping with Geoms
      4. Exercise 7.2
    1. 8.1 R-Ready Data Sets
      1. 8.1.1 Built-in Data Sets
      2. 8.1.2 Contributed Data Sets
    2. 8.2 Reading in External Data Files
      1. 8.2.1 The Table Format
      2. 8.2.2 Spreadsheet Workbooks
      3. 8.2.3 Web-Based Files
      4. 8.2.4 Other File Formats
    3. 8.3 Writing Out Data Files and Plots
      1. 8.3.1 Data Sets
      2. 8.3.2 Plots and Graphics Files
    4. 8.4 Ad Hoc Object Read/Write Operations
      1. Exercise 8.1
    1. 9.1 Scoping
      1. 9.1.1 Environments
      2. 9.1.2 Search Path
      3. 9.1.3 Reserved and Protected Names
      4. Exercise 9.1
    2. 9.2 Argument Matching
      1. 9.2.1 Exact
      2. 9.2.2 Partial
      3. 9.2.3 Positional
      4. 9.2.4 Mixed
      5. 9.2.5 Dot-Dot-Dot: Use of Ellipses
      6. Exercise 9.2
    1. 10.1 if Statements
      1. 10.1.1 Stand-Alone Statement
      2. 10.1.2 else Statements
      3. 10.1.3 Using ifelse for Element-wise Checks
      4. Exercise 10.1
      5. 10.1.4 Nesting and Stacking Statements
      6. 10.1.5 The switch Function
      7. Exercise 10.2
    2. 10.2 Coding Loops
      1. 10.2.1 for Loops
      2. Exercise 10.3
      3. 10.2.2 while Loops
      4. Exercise 10.4
      5. 10.2.3 Implicit Looping with apply
      6. Exercise 10.5
    3. 10.3 Other Control Flow Mechanisms
      1. 10.3.1 Declaring break or next
      2. 10.3.2 The repeat Statement
      3. Exercise 10.6
    1. 11.1 The function Command
      1. 11.1.1 Function Creation
      2. 11.1.2 Using return
      3. Exercise 11.1
    2. 11.2 Arguments
      1. 11.2.1 Lazy Evaluation
      2. 11.2.2 Setting Defaults
      3. 11.2.3 Checking for Missing Arguments
      4. 11.2.4 Dealing with Ellipses
      5. Exercise 11.2
    3. 11.3 Specialized Functions
      1. 11.3.1 Helper Functions
      2. 11.3.2 Disposable Functions
      3. 11.3.3 Recursive Functions
      4. Exercise 11.3
    1. 12.1 Exception Handling
      1. 12.1.1 Formal Notifications: Errors and Warnings
      2. 12.1.2 Catching Errors with try Statements
      3. Exercise 12.1
    2. 12.2 Progress and Timing
      1. 12.2.1 Textual Progress Bars: Are We There Yet?
      2. 12.2.2 Measuring Completion Time: How Long Did It Take?
      3. Exercise 12.2
    3. 12.3 Masking
      1. 12.3.1 Function and Object Distinction
      2. 12.3.2 Data Frame Variable Distinction
    1. 13.1 Describing Raw Data
      1. 13.1.1 Numeric Variables
      2. 13.1.2 Categorical Variables
      3. 13.1.3 Univariate and Multivariate Data
      4. 13.1.4 Parameter or Statistic?
      5. Exercise 13.1
    2. 13.2 Summary Statistics
      1. 13.2.1 Centrality: Mean, Median, Mode
      2. 13.2.2 Counts, Percentages, and Proportions
      3. Exercise 13.2
      4. 13.2.3 Quantiles, Percentiles, and the Five-Number Summary
      5. 13.2.4 Spread: Variance, Standard Deviation, and the Interquartile Range
      6. Exercise 13.3
      7. 13.2.5 Covariance and Correlation
      8. 13.2.6 Outliers
      9. Exercise 13.4
    1. 14.1 Barplots and Pie Charts
      1. 14.1.1 Building a Barplot
      2. 14.1.2 A Quick Pie Chart
    2. 14.2 Histograms
    3. 14.3 Box-and-Whisker Plots
      1. 14.3.1 Stand-Alone Boxplots
      2. 14.3.2 Side-by-Side Boxplots
    4. 14.4 Scatterplots
      1. 14.4.1 Single Plot
      2. 14.4.2 Matrix of Plots
      3. Exercise 14.1
    1. 15.1 What Is a Probability?
      1. 15.1.1 Events and Probability
      2. 15.1.2 Conditional Probability
      3. 15.1.3 Intersection
      4. 15.1.4 Union
      5. 15.1.5 Complement
      6. Exercise 15.1
    2. 15.2 Random Variables and Probability Distributions
      1. 15.2.1 Realizations
      2. 15.2.2 Discrete Random Variables
      3. 15.2.3 Continuous Random Variables
      4. 15.2.4 Shape, Skew, and Modality
      5. Exercise 15.2
    1. 16.1 Common Probability Mass Functions
      1. 16.1.1 Bernoulli Distribution
      2. 16.1.2 Binomial Distribution
      3. Exercise 16.1
      4. 16.1.3 Poisson Distribution
      5. Exercise 16.2
      6. 16.1.4 Other Mass Functions
    2. 16.2 Common Probability Density Functions
      1. 16.2.1 Uniform
      2. Exercise 16.3
      3. 16.2.2 Normal
      4. Exercise 16.4
      5. 16.2.3 Student’s t-distribution
      6. 16.2.4 Exponential
      7. Exercise 16.5
      8. 16.2.5 Other Density Functions
    1. 17.1 Sampling Distributions
      1. 17.1.1 Distribution for a Sample Mean
      2. 17.1.2 Distribution for a Sample Proportion
      3. Exercise 17.1
      4. 17.1.3 Sampling Distributions for Other Statistics
    2. 17.2 Confidence Intervals
      1. 17.2.1 An Interval for a Mean
      2. 17.2.2 An Interval for a Proportion
      3. 17.2.3 Other Intervals
      4. 17.2.4 Comments on Interpretation of a CI
      5. Exercise 17.2
    1. 18.1 Components of a Hypothesis Test
      1. 18.1.1 Hypotheses
      2. 18.1.2 Test Statistic
      3. 18.1.3 p-value
      4. 18.1.4 Significance Level
      5. 18.1.5 Criticisms of Hypothesis Testing
    2. 18.2 Testing Means
      1. 18.2.1 Single Mean
      2. Exercise 18.1
      3. 18.2.2 Two Means
      4. Exercise 18.2
    3. 18.3 Testing Proportions
      1. 18.3.1 Single Proportion
      2. 18.3.2 Two Proportions
      3. Exercise 18.3
    4. 18.4 Testing Categorical Variables
      1. 18.4.1 Single Categorical Variable
      2. 18.4.2 Two Categorical Variables
      3. Exercise 18.4
    5. 18.5 Errors and Power
      1. 18.5.1 Hypothesis Test Errors
      2. 18.5.2 Type I Errors
      3. 18.5.3 Type II Errors
      4. Exercise 18.5
      5. 18.5.4 Statistical Power
      6. Exercise 18.6
    1. 19.1 One-Way ANOVA
      1. 19.1.1 Hypotheses and Diagnostic Checking
      2. 19.1.2 One-Way ANOVA Table Construction
      3. 19.1.3 Building ANOVA Tables with the aov Function
      4. Exercise 19.1
    2. 19.2 Two-Way ANOVA
      1. 19.2.1 A Suite of Hypotheses
      2. 19.2.2 Main Effects and Interactions
    3. 19.3 Kruskal-Wallis Test
      1. Exercise 19.2
    1. 20.1 An Example of a Linear Relationship
    2. 20.2 General Concepts
      1. 20.2.1 Definition of the Model
      2. 20.2.2 Estimating the Intercept and Slope Parameters
      3. 20.2.3 Fitting Linear Models with lm
      4. 20.2.4 Illustrating Residuals
    3. 20.3 Statistical Inference
      1. 20.3.1 Summarizing the Fitted Model
      2. 20.3.2 Regression Coefficient Significance Tests
      3. 20.3.3 Coefficient of Determination
      4. 20.3.4 Other summary Output
    4. 20.4 Prediction
      1. 20.4.1 Confidence Interval or Prediction Interval?
      2. 20.4.2 Interpreting Intervals
      3. 20.4.3 Plotting Intervals
      4. 20.4.4 Interpolation vs. Extrapolation
      5. Exercise 20.1
    5. 20.5 Understanding Categorical Predictors
      1. 20.5.1 Binary Variables: k = 2
      2. 20.5.2 Multilevel Variables: k > 2
      3. 20.5.3 Changing the Reference Level
      4. 20.5.4 Treating Categorical Variables as Numeric
      5. 20.5.5 Equivalence with One-Way ANOVA
      6. Exercise 20.2
    1. 21.1 Terminology
    2. 21.2 Theory
      1. 21.2.1 Extending the Simple Model to a Multiple Model
      2. 21.2.2 Estimating in Matrix Form
      3. 21.2.3 A Basic Example
    3. 21.3 Implementing in R and Interpreting
      1. 21.3.1 Additional Predictors
      2. 21.3.2 Interpreting Marginal Effects
      3. 21.3.3 Visualizing the Multiple Linear Model
      4. 21.3.4 Finding Confidence Intervals
      5. 21.3.5 Omnibus F-Test
      6. 21.3.6 Predicting from a Multiple Linear Model
      7. Exercise 21.1
    4. 21.4 Transforming Numeric Variables
      1. 21.4.1 Polynomial
      2. 21.4.2 Logarithmic
      3. 21.4.3 Other Transformations
      4. Exercise 21.2
    5. 21.5 Interactive Terms
      1. 21.5.1 Concept and Motivation
      2. 21.5.2 One Categorical, One Continuous
      3. 21.5.3 Two Categorical
      4. 21.5.4 Two Continuous
      5. 21.5.5 Higher-Order Interactions
      6. Exercise 21.3
    1. 22.1 Goodness-of-Fit vs. Complexity
      1. 22.1.1 Principle of Parsimony
      2. 22.1.2 General Guidelines
    2. 22.2 Model Selection Algorithms
      1. 22.2.1 Nested Comparisons: The Partial F-Test
      2. 22.2.2 Forward Selection
      3. 22.2.3 Backward Selection
      4. 22.2.4 Stepwise AIC Selection
      5. Exercise 22.1
      6. 22.2.5 Other Selection Algorithms
    3. 22.3 Residual Diagnostics
      1. 22.3.1 Inspecting and Interpreting Residuals
      2. 22.3.2 Assessing Normality
      3. 22.3.3 Illustrating Outliers, Leverage, and Influence
      4. 22.3.4 Calculating Leverage
      5. 22.3.5 Cook’s Distance
      6. 22.3.6 Graphically Combining Residuals, Leverage, and Cook’s Distance
      7. Exercise 22.2
    4. 22.4 Collinearity
      1. 22.4.1 Potential Warning Signs
      2. 22.4.2 Correlated Predictors: A Quick Example
    1. 23.1 Handling the Graphics Device
      1. 23.1.1 Manually Opening a New Device
      2. 23.1.2 Switching Between Devices
      3. 23.1.3 Closing a Device
      4. 23.1.4 Multiple Plots in One Device
    2. 23.2 Plotting Regions and Margins
      1. 23.2.1 Default Spacing
      2. 23.2.2 Custom Spacing
      3. 23.2.3 Clipping
    3. 23.3 Point-and-Click Coordinate Interaction
      1. 23.3.1 Retrieving Coordinates Silently
      2. 23.3.2 Visualizing Selected Coordinates
      3. 23.3.3 Ad Hoc Annotation
      4. Exercise 23.1
    4. 23.4 Customizing Traditional R Plots
      1. 23.4.1 Graphical Parameters for Style and Suppression
      2. 23.4.2 Customizing Boxes
      3. 23.4.3 Customizing Axes
    5. 23.5 Specialized Text and Label Notation
      1. 23.5.1 Font
      2. 23.5.2 Greek Symbols
      3. 23.5.3 Mathematical Expressions
    6. 23.6 A Fully Annotated Scatterplot
      1. Exercise 23.2
    1. 24.1 ggplot or qplot?
    2. 24.2 Smoothing and Shading
      1. 24.2.1 Adding LOESS Trends
      2. 24.2.2 Constructing Smooth Density Estimates
    3. 24.3 Multiple Plots and Variable-Mapped Facets
      1. 24.3.1 Independent Plots
      2. 24.3.2 Facets Mapped to a Categorical Variable
      3. Exercise 24.1
    4. 24.4 Interactive Tools in ggvis
      1. Exercise 24.2
    1. 25.1 Representing and Using Color
      1. 25.1.1 Red-Green-Blue Hexadecimal Color Codes
      2. 25.1.2 Built-in Palettes
      3. 25.1.3 Custom Palettes
      4. 25.1.4 Using Color Palettes to Index a Continuum
      5. 25.1.5 Including a Color Legend
      6. 25.1.6 Opacity
      7. 25.1.7 RGB Alternatives and Further Functionality
      8. Exercise 25.1
    2. 25.2 3D Scatterplots
      1. 25.2.1 Basic Syntax
      2. 25.2.2 Visual Enhancements
      3. Exercise 25.2
    3. 25.3 Preparing a Surface for Plotting
      1. 25.3.1 Constructing an Evaluation Grid
      2. 25.3.2 Constructing the z-Matrix
      3. 25.3.3 Conceptualizing the z-Matrix
    4. 25.4 Contour Plots
      1. 25.4.1 Drawing Contour Lines
      2. 25.4.2 Color-Filled Contours
      3. Exercise 25.3
    5. 25.5 Pixel Images
      1. 25.5.1 One Grid Point = One Pixel
      2. 25.5.2 Surface Truncation and Empty Pixels
      3. Exercise 25.4
    6. 25.6 Perspective Plots
      1. 25.6.1 Basic Plots and Angle Adjustment
      2. 25.6.2 Coloring Facets
      3. 25.6.3 Rotating with Loops
      4. Exercise 25.5
    1. 26.1 Point Clouds
      1. 26.1.1 Basic 3D Cloud
      2. 26.1.2 Visual Enhancements and Legends
      3. 26.1.3 Adding Further 3D Components
      4. Exercise 26.1
    2. 26.2 Bivariate Surfaces
      1. 26.2.1 Basic Perspective Surface
      2. 26.2.2 Additional Components
      3. 26.2.3 Coloring by z Value
      4. 26.2.4 Dealing with the Aspect Ratio
      5. Exercise 26.2
    3. 26.3 Trivariate Surfaces
      1. 26.3.1 Evaluation Coordinates in 3D
      2. 26.3.2 Isosurfaces
      3. 26.3.3 Example: Nonparametric Trivariate Density
    4. 26.4 Handling Parametric Equations
      1. 26.4.1 Simple Loci
      2. 26.4.2 Mathematical Abstractions
      3. Exercise 26.3
    1. A.1 Downloading and Installing R
    2. A.2 Using Packages
      1. A.2.1 Base Packages
      2. A.2.2 Recommended Packages
      3. A.2.3 Contributed Packages
    3. A.3 Updating R and Installed Packages
    4. A.4 Using Other Mirrors and Repositories
      1. A.4.1 Switching CRAN Mirror
      2. A.4.2 Other Package Repositories
    5. A.5 Citing and Writing Packages
      1. A.5.1 Citing R and Contributed Packages
      2. A.5.2 Writing Your Own Packages
    1. B.1 Basic Layout and Usage
      1. B.1.1 Editor Features and Appearance Options
      2. B.1.2 Customizing Panes
    2. B.2 Auxiliary Tools
      1. B.2.1 Projects
      2. B.2.2 Package Installer and Updater
      3. B.2.3 Support for Debugging
      4. B.2.4 Markup, Document, and Graphics Tools