R for Data Science

Book description

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.

Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way.

You’ll learn how to:

  • Wrangle—transform your datasets into a form convenient for analysis
  • Program—learn powerful R tools for solving data problems with greater clarity and ease
  • Explore—examine your data, generate hypotheses, and quickly test them
  • Model—provide a low-dimensional summary that captures true "signals" in your dataset
  • Communicate—learn R Markdown for integrating prose, code, and results

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. What You Will Learn
    2. How This Book Is Organized
    3. What You Won’t Learn
      1. Big Data
      2. Python, Julia, and Friends
      3. Nonrectangular Data
      4. Hypothesis Confirmation
    4. Prerequisites
      1. R
      2. RStudio
      3. The Tidyverse
      4. Other Packages
    5. Running R Code
    6. Getting Help and Learning More
    7. Acknowledgments
    8. Online Version
    9. Conventions Used in This Book
    10. Using Code Examples
    11. O’Reilly Online Learning
    12. How to Contact Us
  2. I. Explore
  3. 1. Data Visualization with ggplot2
    1. Introduction
      1. Prerequisites
    2. First Steps
      1. The mpg Data Frame
      2. Creating a ggplot
      3. A Graphing Template
      4. Exercises
    3. Aesthetic Mappings
      1. Exercises
    4. Common Problems
    5. Facets
      1. Exercises
    6. Geometric Objects
      1. Exercises
    7. Statistical Transformations
      1. Exercises
    8. Position Adjustments
      1. Exercises
    9. Coordinate Systems
      1. Exercises
    10. The Layered Grammar of Graphics
  4. 2. Workflow: Basics
    1. Coding Basics
    2. What’s in a Name?
    3. Calling Functions
      1. Exercises
  5. 3. Data Transformation with dplyr
    1. Introduction
      1. Prerequisites
      2. nycflights13
      3. dplyr Basics
    2. Filter Rows with filter()
      1. Comparisons
      2. Logical Operators
      3. Missing Values
      4. Exercises
    3. Arrange Rows with arrange()
      1. Exercises
    4. Select Columns with select()
      1. Exercises
    5. Add New Variables with mutate()
      1. Useful Creation Functions
      2. Exercises
    6. Grouped Summaries with summarize()
      1. Combining Multiple Operations with the Pipe
      2. Missing Values
      3. Counts
      4. Useful Summary Functions
      5. Grouping by Multiple Variables
      6. Ungrouping
      7. Exercises
    7. Grouped Mutates (and Filters)
      1. Exercises
  6. 4. Workflow: Scripts
    1. Running Code
    2. RStudio Diagnostics
      1. Exercises
  7. 5. Exploratory Data Analysis
    1. Introduction
      1. Prerequisites
    2. Questions
    3. Variation
      1. Visualizing Distributions
      2. Typical Values
      3. Unusual Values
      4. Exercises
    4. Missing Values
      1. Exercises
    5. Covariation
      1. A Categorical and Continuous Variable
      2. Exercises
      3. Two Categorical Variables
      4. Exercises
      5. Two Continuous Variables
      6. Exercises
    6. Patterns and Models
    7. ggplot2 Calls
    8. Learning More
  8. 6. Workflow: Projects
    1. What Is Real?
    2. Where Does Your Analysis Live?
    3. Paths and Directories
    4. RStudio Projects
    5. Summary
  9. II. Wrangle
  10. 7. Tibbles with tibble
    1. Introduction
      1. Prerequisites
    2. Creating Tibbles
    3. Tibbles Versus data.frame
      1. Printing
      2. Subsetting
    4. Interacting with Older Code
      1. Exercises
  11. 8. Data Import with readr
    1. Introduction
      1. Prerequisites
    2. Getting Started
      1. Compared to Base R
      2. Exercises
    3. Parsing a Vector
      1. Numbers
      2. Strings
      3. Factors
      4. Dates, Date-Times, and Times
      5. Exercises
    4. Parsing a File
      1. Strategy
      2. Problems
      3. Other Strategies
    5. Writing to a File
    6. Other Types of Data
  12. 9. Tidy Data with tidyr
    1. Introduction
      1. Prerequisites
    2. Tidy Data
      1. Exercises
    3. Spreading and Gathering
      1. Gathering
      2. Spreading
      3. Exercises
    4. Separating and Pull
      1. Separate
      2. Unite
      3. Exercises
    5. Missing Values
      1. Exercises
    6. Case Study
      1. Exercises
    7. Nontidy Data
  13. 10. Relational Data with dplyr
    1. Introduction
      1. Prerequisites
    2. nycflights13
      1. Exercises
    3. Keys
      1. Exercises
    4. Mutating Joins
      1. Understanding Joins
      2. Inner Join
      3. Outer Joins
      4. Duplicate Keys
      5. Defining the Key Columns
      6. Exercises
      7. Other Implementations
    5. Filtering Joins
      1. Exercises
    6. Join Problems
    7. Set Operations
  14. 11. Strings with stringr
    1. Introduction
      1. Prerequisites
    2. String Basics
      1. String Length
      2. Combining Strings
      3. Subsetting Strings
      4. Locales
      5. Exercises
    3. Matching Patterns with Regular Expressions
      1. Basic Matches
      2. Exercises
      3. Anchors
      4. Exercises
      5. Character Classes and Alternatives
      6. Exercises
      7. Repetition
      8. Exercises
      9. Grouping and Backreferences
      10. Exercises
    4. Tools
      1. Detect Matches
      2. Exercises
      3. Extract Matches
      4. Exercises
      5. Grouped Matches
      6. Exercises
      7. Replacing Matches
      8. Exercises
      9. Splitting
      10. Exercises
      11. Find Matches
    5. Other Types of Pattern
      1. Exercises
    6. Other Uses of Regular Expressions
    7. stringi
      1. Exercises
  15. 12. Factors with forcats
    1. Introduction
      1. Prerequisites
    2. Creating Factors
    3. General Social Survey
      1. Exercises
    4. Modifying Factor Order
      1. Exercises
    5. Modifying Factor Levels
      1. Exercises
  16. 13. Dates and Times with lubridate
    1. Introduction
      1. Prerequisites
    2. Creating Date/Times
      1. From Strings
      2. From Individual Components
      3. From Other Types
      4. Exercises
    3. Date-Time Components
      1. Getting Components
      2. Rounding
      3. Setting Components
      4. Exercises
    4. Time Spans
      1. Durations
      2. Periods
      3. Intervals
      4. Summary
      5. Exercises
    5. Time Zones
  17. III. Program
  18. 14. Pipes with magrittr
    1. Introduction
      1. Prerequisites
    2. Piping Alternatives
      1. Intermediate Steps
      2. Overwrite the Original
      3. Function Composition
      4. Use the Pipe
    3. When Not to Use the Pipe
    4. Other Tools from magrittr
  19. 15. Functions
    1. Introduction
      1. Prerequisites
    2. When Should You Write a Function?
      1. Exercises
    3. Functions Are for Humans and Computers
      1. Exercises
    4. Conditional Execution
      1. Conditions
      2. Multiple Conditions
      3. Code Style
      4. Exercises
    5. Function Arguments
      1. Choosing Names
      2. Checking Values
      3. Dot-Dot-Dot (…)
      4. Lazy Evaluation
      5. Exercises
    6. Return Values
      1. Explicit Return Statements
      2. Writing Pipeable Functions
    7. Environment
  20. 16. Vectors
    1. Introduction
      1. Prerequisites
    2. Vector Basics
    3. Important Types of Atomic Vector
      1. Logical
      2. Numeric
      3. Character
      4. Missing Values
      5. Exercises
    4. Using Atomic Vectors
      1. Coercion
      2. Test Functions
      3. Scalars and Recycling Rules
      4. Naming Vectors
      5. Subsetting
      6. Exercises
    5. Recursive Vectors (Lists)
      1. Visualizing Lists
      2. Subsetting
      3. Lists of Condiments
      4. Exercises
    6. Attributes
    7. Augmented Vectors
      1. Factors
      2. Dates and Date-Times
      3. Tibbles
      4. Exercises
  21. 17. Iteration with purrr
    1. Introduction
      1. Prerequisites
    2. For Loops
      1. Exercises
    3. For Loop Variations
      1. Modifying an Existing Object
      2. Looping Patterns
      3. Unknown Output Length
      4. Unknown Sequence Length
      5. Exercises
    4. For Loops Versus Functionals
      1. Exercises
    5. The Map Functions
      1. Shortcuts
      2. Base R
      3. Exercises
    6. Dealing with Failure
    7. Mapping over Multiple Arguments
      1. Invoking Different Functions
    8. Walk
    9. Other Patterns of For Loops
      1. Predicate Functions
      2. Reduce and Accumulate
      3. Exercises
  22. IV. Model
  23. 18. Model Basics with modelr
    1. Introduction
      1. Prerequisites
    2. A Simple Model
      1. Exercises
    3. Visualizing Models
      1. Predictions
      2. Residuals
      3. Exercises
    4. Formulas and Model Families
      1. Categorical Variables
      2. Interactions (Continuous and Categorical)
      3. Interactions (Two Continuous)
      4. Transformations
      5. Exercises
    5. Missing Values
    6. Other Model Families
  24. 19. Model Building
    1. Introduction
      1. Prerequisites
    2. Why Are Low-Quality Diamonds More Expensive?
      1. Price and Carat
      2. A More Complicated Model
      3. Exercises
    3. What Affects the Number of Daily Flights?
      1. Day of Week
      2. Seasonal Saturday Effect
      3. Computed Variables
      4. Time of Year: An Alternative Approach
      5. Exercises
    4. Learning More About Models
  25. 20. Many Models with purrr and broom
    1. Introduction
      1. Prerequisites
    2. gapminder
      1. Nested Data
      2. List-Columns
      3. Unnesting
      4. Model Quality
      5. Exercises
    3. List-Columns
    4. Creating List-Columns
      1. With Nesting
      2. From Vectorized Functions
      3. From Multivalued Summaries
      4. From a Named List
      5. Exercises
    5. Simplifying List-Columns
      1. List to Vector
      2. Unnesting
      3. Exercises
    6. Making Tidy Data with broom
  26. V. Communicate
  27. 21. R Markdown
    1. Introduction
      1. Prerequisites
    2. R Markdown Basics
      1. Exercises
    3. Text Formatting with Markdown
      1. Exercises
    4. Code Chunks
      1. Chunk Name
      2. Chunk Options
      3. Table
      4. Caching
      5. Global Options
      6. Inline Code
      7. Exercises
    5. Troubleshooting
    6. YAML Header
      1. Parameters
      2. Bibliographies and Citations
    7. Learning More
  28. 22. Graphics for Communication with ggplot2
    1. Introduction
      1. Prerequisites
    2. Label
      1. Exercises
    3. Annotations
      1. Exercises
    4. Scales
      1. Axis Ticks and Legend Keys
      2. Legend Layout
      3. Replacing a Scale
      4. Exercises
    5. Zooming
    6. Themes
    7. Saving Your Plots
      1. Figure Sizing
      2. Other Important Options
    8. Learning More
  29. 23. R Markdown Formats
    1. Introduction
    2. Output Options
    3. Documents
    4. Notebooks
    5. Presentations
    6. Dashboards
    7. Interactivity
      1. htmlwidgets
      2. Shiny
    8. Websites
    9. Other Formats
    10. Learning More
  30. 24. R Markdown Workflow
  31. Index

Product information

  • Title: R for Data Science
  • Author(s): Hadley Wickham, Garrett Grolemund
  • Release date: December 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491910344