You are previewing The R Book, 2nd Edition.
O'Reilly logo
The R Book, 2nd Edition

Book Description

Hugely successful and popular text presenting an extensive and comprehensive guide for all R users

The R language is recognized as one of the most powerful and flexible statistical software packages, enabling users to apply many statistical techniques that would be impossible without such software to help implement such large data sets. R has become an essential tool for understanding and carrying out research.

This edition:

  • Features full colour text and extensive graphics throughout.

  • Introduces a clear structure with numbered section headings to help readers locate information more efficiently.

  • Looks at the evolution of R over the past five years.

  • Features a new chapter on Bayesian Analysis and Meta-Analysis.

  • Presents a fully revised and updated bibliography and reference section.

  • Is supported by an accompanying website allowing examples from the text to be run by the user.

Praise for the first edition:

'...if you are an R user or wannabe R user, this text is the one that should be on your shelf. The breadth of topics covered is unsurpassed when it comes to texts on data analysis in R.' (The American Statistician, August 2008)

'The High-level software language of R is setting standards in quantitative analysis. And now anybody can get to grips with it thanks to The R Book...' (Professional Pensions, July 2007)

Table of Contents

  1. Cover
  2. Title Page
  3. Copyright
  4. Preface
    1. Acknowledgements
  5. Chapter 1: Getting Started
    1. 1.1 How to use this book
    2. 1.2 Installing R
    3. 1.3 Running R
    4. 1.4 The Comprehensive R Archive Network
    5. 1.5 Getting help in R
    6. 1.6 Packages in R
    7. 1.7 Command line versus scripts
    8. 1.8 Data editor
    9. 1.9 Changing the look of the R screen
    10. 1.10 Good housekeeping
    11. 1.11 Linking to other computer languages
  6. Chapter 2: Essentials of the R Language
    1. 2.1 Calculations
    2. 2.2 Logical operations
    3. 2.3 Generating sequences
    4. 2.4 Membership: Testing and coercing in R
    5. 2.5 Missing values, infinity and things that are not numbers
    6. 2.6 Vectors and subscripts
    7. 2.7 Vector functions
    8. 2.8 Matrices and arrays
    9. 2.9 Random numbers, sampling and shuffling
    10. 2.10 Loops and repeats
    11. 2.11 Lists
    12. 2.12 Text, character strings and pattern matching
    13. 2.13 Dates and times in R
    14. 2.14 Environments
    15. 2.15 Writing R functions
    16. 2.16 Writing from R to file
    17. 2.17 Programming tips
  7. Chapter 3: Data Input
    1. 3.1 Data input from the keyboard
    2. 3.2 Data input from files
    3. 3.3 Input from files using scan
    4. 3.4 Reading data from a file using readLines
    5. 3.5 Warnings when you attach the dataframe
    6. 3.6 Masking
    7. 3.7 Input and output formats
    8. 3.8 Checking files from the command line
    9. 3.9 Reading dates and times from files
    10. 3.10 Built-in data files
    11. 3.11 File paths
    12. 3.12 Connections
    13. 3.13 Reading data from an external database
  8. Chapter 4: Dataframes
    1. 4.1 Subscripts and indices
    2. 4.2 Selecting rows from the dataframe at random
    3. 4.3 Sorting dataframes
    4. 4.4 Using logical conditions to select rows from the dataframe
    5. 4.5 Omitting rows containing missing values, NA
    6. 4.6 Using order and !duplicated to eliminate pseudoreplication
    7. 4.7 Complex ordering with mixed directions
    8. 4.8 A dataframe with row names instead of row numbers
    9. 4.9 Creating a dataframe from another kind of object
    10. 4.10 Eliminating duplicate rows from a dataframe
    11. 4.11 Dates in dataframes
    12. 4.12 Using the match function in dataframes
    13. 4.13 Merging two dataframes
    14. 4.14 Adding margins to a dataframe
    15. 4.15 Summarizing the contents of dataframes
  9. Chapter 5: Graphics
    1. 5.1 Plots with two variables
    2. 5.2 Plotting with two continuous explanatory variables: Scatterplots
    3. 5.3 Adding other shapes to a plot
    4. 5.4 Drawing mathematical functions
    5. 5.5 Shape and size of the graphics window
    6. 5.6 Plotting with a categorical explanatory variable
    7. 5.7 Plots for single samples
    8. 5.8 Plots with multiple variables
    9. 5.9 Special plots
    10. 5.10 Saving graphics to file
    11. 5.11 Summary
  10. Chapter 6: Tables
    1. 6.1 Tables of counts
    2. 6.2 Summary tables
    3. 6.3 Expanding a table into a dataframe
    4. 6.4 Converting from a dataframe to a table
    5. 6.5 Calculating tables of proportions with prop.table
    6. 6.6 The scale function
    7. 6.7 The expand.grid function
    8. 6.8 The model.matrix function
    9. 6.9 Comparing table and tabulate
  11. Chapter 7: Mathematics
    1. 7.1 Mathematical functions
    2. 7.2 Probability functions
    3. 7.3 Continuous probability distributions
    4. 7.4 Discrete probability distributions
    5. 7.5 Matrix algebra
    6. 7.6 Solving systems of linear equations using matrices
    7. 7.7 Calculus
  12. Chapter 8: Classical Tests
    1. 8.1 Single samples
    2. 8.2 Bootstrap in hypothesis testing
    3. 8.3 Skew and kurtosis
    4. 8.4 Two samples
    5. 8.5 Tests on paired samples
    6. 8.6 The sign test
    7. 8.7 Binomial test to compare two proportions
    8. 8.8 Chi-squared contingency tables
    9. 8.9 Correlation and covariance
    10. 8.10 Kolmogorov–Smirnov test
    11. 8.11 Power analysis
    12. 8.12 Bootstrap
  13. Chapter 9: Statistical Modelling
    1. 9.1 First things first
    2. 9.2 Maximum likelihood
    3. 9.3 The principle of parsimony (Occam's razor)
    4. 9.4 Types of statistical model
    5. 9.5 Steps involved in model simplification
    6. 9.6 Model formulae in R
    7. 9.7 Multiple error terms
    8. 9.8 The intercept as parameter 1
    9. 9.9 The update function in model simplification
    10. 9.10 Model formulae for regression
    11. 9.11 Box–Cox transformations
    12. 9.12 Model criticism
    13. 9.13 Model checking
    14. 9.14 Influence
    15. 9.15 Summary of statistical models in R
    16. 9.16 Optional arguments in model-fitting functions
    17. 9.17 Akaike's information criterion
    18. 9.18 Leverage
    19. 9.19 Misspecified model
    20. 9.20 Model checking in R
    21. 9.21 Extracting information from model objects
    22. 9.22 The summary tables for continuous and categorical explanatory variables
    23. 9.23 Contrasts
    24. 9.24 Model simplification by stepwise deletion
    25. 9.25 Comparison of the three kinds of contrasts
    26. 9.26 Aliasing
    27. 9.27 Orthogonal polynomial contrasts: contr.poly
    28. 9.28 Summary of statistical modelling
  14. Chapter 10: Regression
    1. 10.1 Linear regression
    2. 10.2 Polynomial approximations to elementary functions
    3. 10.3 Polynomial regression
    4. 10.4 Fitting a mechanistic model to data
    5. 10.5 Linear regression after transformation
    6. 10.6 Prediction following regression
    7. 10.7 Testing for lack of fit in a regression
    8. 10.8 Bootstrap with regression
    9. 10.9 Jackknife with regression
    10. 10.10 Jackknife after bootstrap
    11. 10.11 Serial correlation in the residuals
    12. 10.12 Piecewise regression
    13. 10.13 Multiple regression
  15. Chapter 11: Analysis of Variance
    1. 11.1 One-way ANOVA
    2. 11.2 Factorial experiments
    3. 11.3 Pseudoreplication: Nested designs and split plots
    4. 11.4 Variance components analysis
    5. 11.5 Effect sizes in ANOVA: aov or lm?
    6. 11.6 Multiple comparisons
    7. 11.7 Multivariate analysis of variance
  16. Chapter 12: Analysis of Covariance
    1. 12.1 Analysis of covariance in R
    2. 12.2 ANCOVA and experimental design
    3. 12.3 ANCOVA with two factors and one continuous covariate
    4. 12.4 Contrasts and the parameters of ANCOVA models
    5. 12.5 Order matters in summary.aov
  17. Chapter 13: Generalized Linear Models
    1. 13.1 Error structure
    2. 13.2 Linear predictor
    3. 13.3 Link function
    4. 13.4 Proportion data and binomial errors
    5. 13.5 Count data and Poisson errors
    6. 13.6 Deviance: Measuring the goodness of fit of a GLM
    7. 13.7 Quasi-likelihood
    8. 13.8 The quasi family of models
    9. 13.9 Generalized additive models
    10. 13.10 Offsets
    11. 13.11 Residuals
    12. 13.12 Overdispersion
    13. 13.13 Bootstrapping a GLM
    14. 13.14 Binomial GLM with ordered categorical variables
  18. Chapter 14: Count Data
    1. 14.1 A regression with Poisson errors
    2. 14.2 Analysis of deviance with count data
    3. 14.3 Analysis of covariance with count data
    4. 14.4 Frequency distributions
    5. 14.5 Overdispersion in log-linear models
    6. 14.6 Negative binomial errors
  19. Chapter 15: Count Data in Tables
    1. 15.1 A two-class table of counts
    2. 15.2 Sample size for count data
    3. 15.3 A four-class table of counts
    4. 15.4 Two-by-two contingency tables
    5. 15.5 Using log-linear models for simple contingency tables
    6. 15.6 The danger of contingency tables
    7. 15.7 Quasi-Poisson and negative binomial models compared
    8. 15.8 A contingency table of intermediate complexity
    9. 15.9 Schoener's lizards: A complex contingency table
    10. 15.10 Plot methods for contingency tables
    11. 15.11 Graphics for count data: Spine plots and spinograms
  20. Chapter 16: Proportion Data
    1. 16.1 Analyses of data on one and two proportions
    2. 16.2 Count data on proportions
    3. 16.3 Odds
    4. 16.4 Overdispersion and hypothesis testing
    5. 16.5 Applications
    6. 16.6 Averaging proportions
    7. 16.7 Summary of modelling with proportion count data
    8. 16.8 Analysis of covariance with binomial data
    9. 16.9 Converting complex contingency tables to proportions
  21. Chapter 17: Binary Response Variables
    1. 17.1 Incidence functions
    2. 17.2 Graphical tests of the fit of the logistic to data
    3. 17.3 ANCOVA with a binary response variable
    4. 17.4 Binary response with pseudoreplication
  22. Chapter 18: Generalized Additive Models
    1. 18.1 Non-parametric smoothers
    2. 18.2 Generalized additive models
    3. 18.3 An example with strongly humped data
    4. 18.4 Generalized additive models with binary data
    5. 18.5 Three-dimensional graphic output from gam
  23. Chapter 19: Mixed-Effects Models
    1. 19.1 Replication and pseudoreplication
    2. 19.2 The lme and lmer functions
    3. 19.3 Best linear unbiased predictors
    4. 19.4 Designed experiments with different spatial scales: Split plots
    5. 19.5 Hierarchical sampling and variance components analysis
    6. 19.6 Mixed-effects models with temporal pseudoreplication
    7. 19.7 Time series analysis in mixed-effects models
    8. 19.8 Random effects in designed experiments
    9. 19.9 Regression in mixed-effects models
    10. 19.10 Generalized linear mixed models
  24. Chapter 20: Non-Linear Regression
    1. 20.1 Comparing Michaelis–Menten and asymptotic exponential
    2. 20.2 Generalized additive models
    3. 20.3 Grouped data for non-linear estimation
    4. 20.4 Non-linear time series models (temporal pseudoreplication)
    5. 20.5 Self-starting functions
    6. 20.6 Bootstrapping a family of non-linear regressions
  25. Chapter 21: Meta-Analysis
    1. 21.1 Effect size
    2. 21.2 Weights
    3. 21.3 Fixed versus random effects
    4. 21.4 Random-effects meta-analysis of binary data
  26. Chapter 22: Bayesian Statistics
    1. 22.1 Background
    2. 22.2 A continuous response variable
    3. 22.3 Normal prior and normal likelihood
    4. 22.4 Priors
    5. 22.5 Bayesian statistics for realistically complicated models
    6. 22.6 Practical considerations
    7. 22.7 Writing BUGS models
    8. 22.8 Packages in R for carrying out Bayesian analysis
    9. 22.9 Installing JAGS on your computer
    10. 22.10 Running JAGS in R
    11. 22.11 MCMC for a simple linear regression
    12. 22.12 MCMC for a model with temporal pseudoreplication
    13. 22.13 MCMC for a model with binomial errors
  27. Chapter 23: Tree Models
    1. 23.1 Background
    2. 23.2 Regression trees
    3. 23.3 Using rpart to fit tree models
    4. 23.4 Tree models as regressions
    5. 23.5 Model simplification
    6. 23.6 Classification trees with categorical explanatory variables
    7. 23.7 Classification trees for replicated data
    8. 23.8 Testing for the existence of humps
  28. Chapter 24: Time Series Analysis
    1. 24.1 Nicholson's blowflies
    2. 24.2 Moving average
    3. 24.3 Seasonal data
    4. 24.4 Built-in time series functions
    5. 24.5 Decompositions
    6. 24.6 Testing for a trend in the time series
    7. 24.7 Spectral analysis
    8. 24.8 Multiple time series
    9. 24.9 Simulated time series
    10. 24.10 Time series models
  29. Chapter 25: Multivariate Statistics
    1. 25.1 Principal components analysis
    2. 25.2 Factor analysis
    3. 25.3 Cluster analysis
    4. 25.4 Hierarchical cluster analysis
    5. 25.5 Discriminant analysis
    6. 25.6 Neural networks
  30. Chapter 26: Spatial Statistics
    1. 26.1 Point processes
    2. 26.2 Nearest neighbours
    3. 26.3 Tests for spatial randomness
    4. 26.4 Packages for spatial statistics
    5. 26.5 Geostatistical data
    6. 26.6 Regression models with spatially correlated errors: Generalized least squares
    7. 26.7 Creating a dot-distribution map from a relational database
  31. Chapter 27: Survival Analysis
    1. 27.1 A Monte Carlo experiment
    2. 27.2 Background
    3. 27.3 The survivor function
    4. 27.4 The density function
    5. 27.5 The hazard function
    6. 27.6 The exponential distribution
    7. 27.7 Kaplan–Meier survival distributions
    8. 27.8 Age-specific hazard models
    9. 27.9 Survival analysis in R
    10. 27.10 Parametric analysis
    11. 27.11 Cox's proportional hazards
    12. 27.12 Models with censoring
  32. Chapter 28: Simulation Models
    1. 28.1 Temporal dynamics: Chaotic dynamics in population size
    2. 28.2 Temporal and spatial dynamics: A simulated random walk in two dimensions
    3. 28.3 Spatial simulation models
    4. 28.4 Pattern generation resulting from dynamic interactions
  33. Chapter 29: Changing the Look of Graphics
    1. 29.1 Graphs for publication
    2. 29.2 Colour
    3. 29.3 Cross-hatching
    4. 29.4 Grey scale
    5. 29.5 Coloured convex hulls and other polygons
    6. 29.6 Logarithmic axes
    7. 29.7 Different font families for text
    8. 29.8 Mathematical and other symbols on plots
    9. 29.9 Phase planes
    10. 29.10 Fat arrows
    11. 29.11 Three-dimensional plots
    12. 29.12 Complex 3D plots with wireframe
    13. 29.13 An alphabetical tour of the graphics parameters
    14. 29.14 Trellis graphics
  34. References and Further Reading
  35. Index