Chapter 3. Acquire and Prepare the Ingredients – Your Data

In this chapter, we will cover:

  • Reading data from CSV files
  • Reading XML data
  • Reading JSON data
  • Reading data from fixed-width formatted files
  • Reading data from R data files and R libraries
  • Removing cases with missing values
  • Replacing missing values with the mean
  • Removing duplicate cases
  • Rescaling a variable to [0,1]
  • Normalizing or standardizing data in a data frame
  • Binning numerical data
  • Creating dummies for categorical variables

Introduction

Data analysts need to load data from many different input formats into R. Although R has its own native data format, data usually exists in text formats, such as CSV (Comma Separated Values), JSON (JavaScript Object Notation), and XML (Extensible Markup Language ...

Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.