You are previewing The R Book.
O'Reilly logo
The R Book

Book Description

The high-level language of R is recognized as one of the most powerful and flexible statistical software environments, and is rapidly becoming the standard setting for quantitative analysis, statistics and graphics. R provides free access to unrivalled coverage and cutting-edge applications, enabling the user to apply numerous statistical methods ranging from simple regression to time series or multivariate analysis.

Building on the success of the author's bestselling Statistics: An Introduction using R, The R Book is packed with worked examples, providing an all inclusive guide to R, ideal for novice and more accomplished users alike. The book assumes no background in statistics or computing and introduces the advantages of the R environment, detailing its applications in a wide range of disciplines.

  • Provides the first comprehensive reference manual for the R language, including practical guidance and full coverage of the graphics facilities.

  • Introduces all the statistical models covered by R, beginning with simple classical tests such as chi-square and t-test.

  • Proceeds to examine more advance methods, from regression and analysis of variance, through to generalized linear models, generalized mixed models, time series, spatial statistics, multivariate statistics and much more.

The R Book is aimed at undergraduates, postgraduates and professionals in science, engineering and medicine. It is also ideal for students and professionals in statistics, economics, geography and the social sciences.

Table of Contents

  1. Cover Page
  2. Title Page
  3. Copyright
  4. Contents
  5. Preface
    1. Acknowledgements
  6. 1: Getting Started
    1. Installing R
    2. Running R
    3. Getting Help in R
    4. Online Help
    5. Worked Examples of Functions
    6. Demonstrations of R Functions
    7. Libraries in R
    8. Contents of Libraries
    9. Installing Packages and Libraries
    10. Command Line versus Scripts
    11. Data Editor
    12. Changing the Look of the R Screen
    13. Significance Stars
    14. Disappearing Graphics
    15. Good Housekeeping
    16. Linking to Other Computer Languages
    17. Tidying Up
  7. 2: Essentials of the R Language
    1. Screen prompt
    2. Built-in Functions
    3. Numbers with Exponents
    4. Modulo and Integer Quotients
    5. Rounding
    6. Infinity and Things that Are Not a Number (NaN)
    7. Missing values NA
    8. Assignment
    9. Operators
    10. Creating a Vector
    11. Named Elements within Vectors
    12. Vector Functions
    13. Summary Information from Vectors by Groups
    14. Using with rather than attach
    15. Using attach in This Book
    16. Parallel Minima and Maxima: pmin and pmax
    17. Subscripts and Indices
    18. Working with Vectors and Logical Subscripts
    19. Addresses within Vectors
    20. Finding Closest Values
    21. Trimming Vectors Using Negative Subscripts
    22. Logical Arithmetic
    23. Evaluation of combinations of TRUE and FALSE
    24. Repeats
    25. Generate Factor Levels
    26. Generating Regular Sequences of Numbers
    27. Variable Names
    28. Sorting, Ranking and Ordering
    29. The sample Function
    30. Matrices
    31. Arrays
    32. Character Strings
    33. The match Function
    34. Writing functions in R
    35. Variance
    36. Degrees of freedom
    37. Variance Ratio Test
    38. Using Variance
    39. Error Bars
    40. Loops and Repeats
    41. The switch Function
    42. The Evaluation Environment of a Function
    43. Scope
    44. Optional Arguments
    45. Variable Numbers of Arguments (...)
    46. Returning Values from a Function
    47. Anonymous Functions
    48. Flexible Handling of Arguments to Functions
    49. Evaluating Functions with apply , sapply and lapply
    50. Looking for runs of numbers within vectors
    51. Saving Data Produced within R to Disc
    52. Pasting into an Excel Spreadsheet
    53. Writing an Excel Readable File from R
    54. Testing for Equality
    55. Sets: union , intersect and setdiff
    56. Pattern Matching
    57. Testing and Coercing in R
    58. Dates and Times in R
  8. 3: Data Input
    1. The scan Function
    2. Data Input from Files
    3. Saving the File from Excel
    4. Common Errors when Using read.table
    5. Browsing to Find Files
    6. Separators and Decimal Points
    7. Input and Output Formats
    8. Setting the Working Directory
    9. Checking Files from the Command Line
    10. Reading Dates and Times from Files
    11. Built-in Data Files
    12. Reading Data from Files with Non-standard Formats Using scan
    13. Reading Files with Different Numbers of Values per Line
    14. The readLnes Function
  9. 4: Dataframes
    1. Subscripts and Indices
    2. Selecting Rows from the Dataframe at Random
    3. Sorting Dataframes
    4. Using Logical Conditions to Select Rows from the Dataframe
    5. Omitting Rows Containing Missing Values, NA
    6. Using order and unique to Eliminate Pseudoreplication
    7. Complex Ordering with Mixed Directions
    8. A Dataframe with Row Names instead of Row Numbers
    9. Creating a Dataframe from Another Kind of Object
    10. Eliminating Duplicate Rows from a Dataframe
    11. Dates in Dataframes
    12. Selecting Variables on the Basis of their Attributes
    13. Using the match Function in Dataframes
    14. Merging Two Dataframes
    15. Adding Margins to a Dataframe
    16. Summarizing the Contents of Dataframes
  10. 5: Graphics
    1. Plots with Two Variables
    2. Plots for Single Samples
    3. Plots with multiple variables
    4. Special Plots
    5. Summary
  11. 6: Tables
    1. Summary Tables
    2. Tables of Counts
    3. Expanding a Table into a Dataframe
    4. Converting from a Dataframe to a Table
    5. Calculating tables of proportions
    6. The scale function
    7. The expand.grid function
    8. The model.matrix function
  12. 7: Mathematics
    1. Mathematical Functions
    2. Continuous Probability Distributions
    3. Discrete probability distributions
    4. Matrix Algebra
    5. Calculus
    6. Differential equations
  13. 8: Classical Tests
    1. Single Samples
    2. Two samples
  14. 9: Statistical Modelling
    1. Maximum Likelihood
    2. The Principle of Parsimony (Occam's Razor)
    3. Types of Statistical Model
    4. Steps Involved in Model Simplification
    5. Model Formulae in R
    6. Box–Cox Transformations
    7. Model Criticism
    8. Model checking
    9. Summary of Statistical Models in R
    10. Optional arguments in model-fitting functions
    11. Dataframes containing the same variable names
    12. Akaike's Information Criterion
    13. Misspecified Model
    14. Model checking in R
  15. 10: Regression
    1. Linear Regression
    2. Polynomial Approximations to Elementary Functions
    3. Polynomial Regression
    4. Fitting a Mechanistic Model to Data
    5. Linear Regression after Transformation
    6. Prediction following Regression
    7. Testing for Lack of Fit in a Regression with Replicated Data at Each Level of x
    8. Bootstrap with Regression
    9. Jackknife with regression
    10. Jackknife after Bootstrap
    11. Serial correlation in the residuals
    12. Piecewise Regression
    13. Robust Fitting of Linear Models
    14. Model Simplification
    15. The Multiple Regression Model
  16. 11: Analysis of Variance
    1. One-Way ANOVA
    2. ANOVA with aov or lm
    3. Effect Sizes
    4. Multiple Comparisons
    5. Projections of Models
    6. Multivariate Analysis of Variance
  17. 12: Analysis of Covariance
    1. ANCOVA and Experimental Design
    2. A More Complex ANCOVA: Two Factors and One Continuous Covariate
    3. Order matters in summary.aov
  18. 13: Generalized Linear Models
    1. Error Structure
    2. Linear Predictor
    3. Link Function
    4. Canonical Link Functions
    5. Proportion Data and Binomial Errors
    6. Count Data and Poisson Errors
    7. Deviance: Measuring the Goodness of Fit of a GLM
    8. Quasi-likelihood
    9. Generalized Additive Models
    10. Offsets
    11. Residuals
    12. Misspecified Error Structure
    13. Misspecified Link Function
    14. Overdispersion
    15. Bootstrapping a GLM
  19. 14: Count Data
    1. A Regression with Poisson Errors
    2. Analysis of Deviance with Count Data
    3. Frequency Distributions
    4. Overdispersion in Log-linear Models
    5. Negative binomial errors
    6. Use of lmer with Complex Nesting
  20. 15: Count Data in Tables
    1. A Two-Class Table of Counts
    2. Sample Size for Count Data
    3. A Four-Class Table of Counts
    4. Two-by-Two Contingency Tables
    5. Using Log-linear Models for Simple Contingency Tables
    6. The Danger of Contingency Tables
    7. Quasi-Poisson and Negative Binomial Models Compared
    8. A Contingency Table of Intermediate Complexity
    9. Schoener's Lizards: A Complex Contingency Table
    10. Plot Methods for Contingency Tables
  21. 16: Proportion Data
    1. Analyses of Data on One and Two Proportions
    2. Odds
    3. Overdispersion and Hypothesis Testing
    4. Applications
    5. Estimating LD50 and LD90 from bioassay data
    6. Converting Complex Contingency Tables to Proportions
    7. Analysing Schoener's Lizards as Proportion Data
    8. Generalized mixed models lmer with proportion data
  22. 17: Binary Response Variables
    1. Incidence functions
    2. Graphical Tests of the Fit of the Logistic to Data
    3. ANCOVA with a Binary Response Variable
    4. Binary Response with Pseudoreplication
  23. 18: Generalized Additive Models
    1. Non-parametric Smoothers
    2. Generalized Additive Models
    3. An example with strongly humped data
    4. Generalized Additive Models with Binary Data
    5. Three-Dimensional Graphic Output from gam
  24. 19: Mixed-Effects Models
    1. Replication and Pseudoreplication
    2. The lme and lmer Functions
    3. Best Linear Unbiased Predictors
    4. A Designed Experiment with Different Spatial Scales: Split Plots
    5. Hierarchical Sampling and Variance Components Analysis
    6. Model Simplification in Hierarchical Sampling
    7. Mixed-Effects Models with Temporal Pseudoreplication
    8. Time Series Analysis in Mixed-Effects Models
    9. Random Effects in Designed Experiments
    10. Regression in Mixed-Effects Models
    11. Generalized Linear Mixed Models
    12. Fixed Effects in Hierarchical Sampling
    13. Error Plots from a Hierarchical Analysis
  25. 20: Non-linear Regression
    1. Comparing Michaelis–Menten and Asymptotic Exponential
    2. Generalized Additive Models
    3. Grouped Data for Non-linear Estimation
    4. Non-linear Time Series Models (Temporal Pseudoreplication)
    5. Self-starting Functions
    6. Self-starting four-parameter logistic
    7. Bootstrapping a Family of Non-linear Regressions
  26. 21: Tree Models
    1. Background
    2. Regression Trees
    3. Classification trees with categorical explanatory variables
    4. Classification trees for replicated data
    5. Testing for the existence of humps
  27. 22: Time Series Analysis
    1. Nicholson's Blowflies
    2. Moving Average
    3. Seasonal Data
    4. Built-in Time Series Functions
    5. Decompositions
    6. Testing for a Trend in the Time Series
    7. Spectral Analysis
    8. Multiple Time Series
    9. Simulated Time Series
    10. Time Series Models
    11. Time series modelling on the Canadian lynx data
  28. 23: Multivariate Statistics
    1. Principal Components Analysis
    2. Factor Analysis
    3. Cluster Analysis
    4. Neural Networks
  29. 24: Spatial Statistics
    1. Point Processes
    2. Nearest Neighbours
    3. Tests for Spatial Randomness
    4. Libraries for spatial statistics
    5. The spatstat library
    6. Geostatistical data
    7. Regression Models with Spatially Correlated Errors: Generalized Least Squares
  30. 25: Survival Analysis
    1. A Monte Carlo Experiment
    2. Background
    3. The Exponential Distribution
    4. Kaplan–Meier Survival Distributions
    5. Age-Specific Hazard Models
    6. Parametric analysis
    7. Cox's Proportional Hazards
    8. Models with Censoring
  31. 26: Simulation Models
    1. Temporal and Spatial Dynamics: a Simulated Random Walk in Two Dimensions
    2. Spatial Simulation Models
    3. Metapopulation dynamics
    4. Pattern Generation Resulting from Dynamic Interactions
  32. 27: Changing the Look of Graphics
    1. Graphs for Publication
    2. Shading
    3. Logarithmic Axes
    4. Axis Labels Containing Subscripts and Superscripts
    5. Different font families for text
    6. Mathematical Symbols on Plots
    7. Phase Planes
    8. Fat Arrows
    9. Trellis Plots
    10. Three-Dimensional Plots
    11. Complex 3D plots with wireframe
    12. An Alphabetical Tour of the Graphics Parameters
  33. References and Further Reading
  34. Index