You are previewing R for Everyone: Advanced Analytics and Graphics.
O'Reilly logo
R for Everyone: Advanced Analytics and Graphics

Book Description

Statistical Computation for Programmers, Scientists, Quants, Excel Users, and Other Professionals

Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution.

Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you’ll need to accomplish 80 percent of modern data tasks.

Lander’s self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You’ll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you’ll construct several complete models, both linear and nonlinear, and use some data mining techniques.

By the time you’re done, you won’t just know how to write R programs, you’ll be ready to tackle the statistical problems you care about most.

COVERAGE INCLUDES

• Exploring R, RStudio, and R packages

• Using R for math: variable types, vectors, calling functions, and more

• Exploiting data structures, including data.frames, matrices, and lists

• Creating attractive, intuitive statistical graphics

• Writing user-defined functions

• Controlling program flow with if, ifelse, and complex checks

• Improving program efficiency with group manipulations

• Combining and reshaping multiple datasets

• Manipulating strings using R’s facilities and regular expressions

• Creating normal, binomial, and Poisson probability distributions

• Programming basic statistics: mean, standard deviation, and t-tests

• Building linear, generalized linear, and nonlinear models

• Assessing the quality of models and variable selection

• Preventing overfitting, using the Elastic Net and Bayesian methods

• Analyzing univariate and multivariate time series data

• Grouping data via K-means and hierarchical clustering

• Preparing reports, slideshows, and web pages with knitr

• Building reusable R packages with devtools and Rcpp

• Getting involved with the R global community

Table of Contents

  1. Title Page
  2. Copyright Page
  3. Dedication Page
  4. About This eBook
  5. Contents
  6. Foreword
  7. Preface
  8. Acknowledgments
  9. About the Author
  10. Chapter 1. Getting R
    1. 1.1. Downloading R
    2. 1.2. R Version
    3. 1.3. 32-bit versus 64-bit
    4. 1.4. Installing
    5. 1.5. Revolution R Community Edition
    6. 1.6. Conclusion
  11. Chapter 2. The R Environment
    1. 2.1. Command Line Interface
    2. 2.2. RStudio
    3. 2.3. Revolution Analytics RPE
    4. 2.4. Conclusion
  12. Chapter 3. R Packages
    1. 3.1. Installing Packages
    2. 3.2. Loading Packages
    3. 3.3. Building a Package
    4. 3.4. Conclusion
  13. Chapter 4. Basics of R
    1. 4.1. Basic Math
    2. 4.2. Variables
    3. 4.3. Data Types
    4. 4.4. Vectors
    5. 4.5. Calling Functions
    6. 4.6. Function Documentation
    7. 4.7. Missing Data
    8. 4.8. Conclusion
  14. Chapter 5. Advanced Data Structures
    1. 5.1. data.frames
    2. 5.2. Lists
    3. 5.3. Matrices
    4. 5.4. Arrays
    5. 5.5. Conclusion
  15. Chapter 6. Reading Data into R
    1. 6.1. Reading CSVs
    2. 6.2. Excel Data
    3. 6.3. Reading from Databases
    4. 6.4. Data from Other Statistical Tools
    5. 6.5. R Binary Files
    6. 6.6. Data Included with R
    7. 6.7. Extract Data from Web Sites
    8. 6.8. Conclusion
  16. Chapter 7. Statistical Graphics
    1. 7.1. Base Graphics
    2. 7.2. ggplot2
    3. 7.3. Conclusion
  17. Chapter 8. Writing R functions
    1. 8.1. Hello, World!
    2. 8.2. Function Arguments
    3. 8.3. Return Values
    4. 8.4. do.call
    5. 8.5. Conclusion
  18. Chapter 9. Control Statements
    1. 9.1. if and else
    2. 9.2. switch
    3. 9.3. ifelse
    4. 9.4. Compound Tests
    5. 9.5. Conclusion
  19. Chapter 10. Loops, the Un-R Way to Iterate
    1. 10.1. for Loops
    2. 10.2. while Loops
    3. 10.3. Controlling Loops
    4. 10.4. Conclusion
  20. Chapter 11. Group Manipulation
    1. 11.1. Apply Family
    2. 11.2. aggregate
    3. 11.3. plyr
    4. 11.4. data.table
    5. 11.5. Conclusion
  21. Chapter 12. Data Reshaping
    1. 12.1. cbind and rbind
    2. 12.2. Joins
    3. 12.3. reshape2
    4. 12.4. Conclusion
  22. Chapter 13. Manipulating Strings
    1. 13.1. paste
    2. 13.2. sprintf
    3. 13.3. Extracting Text
    4. 13.4. Regular Expressions
    5. 13.5. Conclusion
  23. Chapter 14. Probability Distributions
    1. 14.1. Normal Distribution
    2. 14.2. Binomial Distribution
    3. 14.3. Poisson Distribution
    4. 14.4. Other Distributions
    5. 14.5. Conclusion
  24. Chapter 15. Basic Statistics
    1. 15.1. Summary Statistics
    2. 15.2. Correlation and Covariance
    3. 15.3. T-Tests
    4. 15.4. ANOVA
    5. 15.5. Conclusion
  25. Chapter 16. Linear Models
    1. 16.1. Simple Linear Regression
    2. 16.2. Multiple Regression
    3. 16.3. Conclusion
  26. Chapter 17. Generalized Linear Models
    1. 17.1. Logistic Regression
    2. 17.2. Poisson Regression
    3. 17.3. Other Generalized Linear Models
    4. 17.4. Survival Analysis
    5. 17.5. Conclusion
  27. Chapter 18. Model Diagnostics
    1. 18.1. Residuals
    2. 18.2. Comparing Models
    3. 18.3. Cross-Validation
    4. 18.4. Bootstrap
    5. 18.5. Stepwise Variable Selection
    6. 18.6. Conclusion
  28. Chapter 19. Regularization and Shrinkage
    1. 19.1. Elastic Net
    2. 19.2. Bayesian Shrinkage
    3. 19.3. Conclusion
  29. Chapter 20. Nonlinear Models
    1. 20.1. Nonlinear Least Squares
    2. 20.2. Splines
    3. 20.3. Generalized Additive Models
    4. 20.4. Decision Trees
    5. 20.5. Random Forests
    6. 20.6. Conclusion
  30. Chapter 21. Time Series and Autocorrelation
    1. 21.1. Autoregressive Moving Average
    2. 21.2. VAR
    3. 21.3. GARCH
    4. 21.4. Conclusion
  31. Chapter 22. Clustering
    1. 22.1. K-means
    2. 22.2. PAM
    3. 22.3. Hierarchical Clustering
    4. 22.4. Conclusion
  32. Chapter 23. Reproducibility, Reports and Slide Shows with knitr
    1. 23.1. Installing a LATEX Program
    2. 23.2. LATEX Primer
    3. 23.3. Using knitr with LATEX
    4. 23.4. Markdown Tips
    5. 23.5. Using knitr and Markdown
    6. 23.6. pandoc
    7. 23.7. Conclusion
  33. Chapter 24. Building R Packages
    1. 24.1. Folder Structure
    2. 24.2. Package Files
    3. 24.3. Package Documentation
    4. 24.4. Checking, Building and Installing
    5. 24.5. Submitting to CRAN
    6. 24.6. C++ Code
    7. 24.7. Conclusion
  34. Appendix A. Real-Life Resources
    1. A.1. Meetups
    2. A.2. Stack Overflow
    3. A.3. Twitter
    4. A.4. Conferences
    5. A.5. Web Sites
    6. A.6. Documents
    7. A.7. Books
    8. A.8. Conclusion
  35. Appendix B. Glossary
  36. List of Figures
  37. List of Tables
  38. General Index
  39. Index of Functions
  40. Index of Packages
  41. Index of People
  42. Data Index