You are previewing Data Manipulation with R - Second Edition.
O'Reilly logo
Data Manipulation with R - Second Edition

Book Description

Efficiently perform data manipulation using the split-apply-combine strategy in R

In Detail

This book starts with the installation of R and how to go about using R and its libraries. We then discuss the mode of R objects and its classes and then highlight different R data types with their basic operations.

The primary focus on group-wise data manipulation with the split-apply-combine strategy has been explained with specific examples. The book also contains coverage of some specific libraries such as lubridate, reshape2, plyr, dplyr, stringr, and sqldf. You will not only learn about group-wise data manipulation, but also learn how to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package.

By the end of this book, you will have learned about text manipulation using stringr, how to extract data from twitter using twitteR library, how to clean raw data, and how to structure your raw data for data mining.

What You Will Learn

  • Learn about R data types and their basic operations

  • Work efficiently with string, factor, and date variables using stringr

  • Understand group-wise data manipulation

  • Work with different layouts of R datasets and interchange between layouts for varied purposes

  • Manage bigger datasets using pylr and dpylr

  • Perform data manipulation with add-on packages such as plyr, reshape, stringr, lubridate, and sqldf

  • Manipulate datasets using SQL statements with the sqldf package

  • Clean and structure raw data for data mining using text manipulation

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the files e-mailed directly to you.

    Table of Contents

    1. Data Manipulation with R Second Edition
      1. Table of Contents
      2. Data Manipulation with R Second Edition
      3. Credits
      4. About the Authors
      5. About the Reviewers
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
      8. 1. Introduction to R Data Types and Basic Operations
        1. Getting different versions of R
        2. Installing R on different platforms
        3. Installing and using R libraries
          1. Manually downloading and installing packages
          2. Installing packages within the R shell
        4. Comparing R with other software
        5. R as an enterprise solution
        6. Writing commands in R
        7. R data types and basic operations
          1. Modes and classes of R objects
        8. The R object structure and mode conversion
          1. Vector
        9. Factor and its types
          1. Data frame
          2. Matrices
          3. Arrays
          4. List
        10. Missing values in R
        11. Summary
      9. 2. Basic Data Manipulation
        1. Acquiring data
        2. Vector and matrix operations
        3. Factor manipulation
        4. Factors from numeric variables
        5. Date processing using lubridate
        6. Character manipulation
        7. Subscripting and subsetting
        8. Summary
      10. 3. Data Manipulation Using plyr and dplyr
        1. Applying the split-apply-combine strategy
        2. Introducing the plyr and dplyr libraries
          1. plyr's utilities
          2. Intuitive function names in the plyr library
          3. Inputs and arguments
          4. Multiargument functions
        3. Comparing base R and plyr
        4. Powerful data manipulation with dplyr
          1. Filtering and slicing rows
          2. Arranging rows
          3. Selecting and renaming
          4. Adding new columns
          5. Selecting distinct rows
          6. Column-wise descriptive statistics
          7. Group-wise operations
          8. Chaining
        5. Summary
      11. 4. Reshaping Datasets
        1. Typical layout of a dataset
          1. Long layout
          2. Wide layout
        2. New layout of a dataset
        3. Reshaping the dataset from the typical layout
        4. Reshaping the dataset with the reshape package
          1. Melting data
            1. Missing values in molten data
          2. Casting molten data
        5. The reshape2 package
        6. Summary
      12. 5. R and Databases
        1. R and different databases
          1. R and Excel
          2. R and MS Access
        2. Relational databases in R
          1. The filehash package
          2. The ff package
        3. R and sqldf
        4. Data manipulation using sqldf
        5. Summary
      13. 6. Text Manipulation
        1. Text data and its source
          1. Getting text data
        2. Text processing using default functions
        3. Working with Twitter data
        4. Summary
      14. Index