Chapter 5. Efficient Input/Output

This chapter explains how to efficiently read and write data in R. Input/output (I/O) is the technical term for reading and writing data: the process of getting information into a particular computer system (in this case, R) and then exporting it to the outside world again (in this case, as a file format that other software can read). Data I/O will be needed on projects where data comes from, or goes to, external sources. However, the majority of R resources and documentation start with the optimistic assumption that your data has already been loaded, ignoring the fact that importing datasets into R and exporting them to the world outside the R ecosystem can be a time-consuming and frustrating process. Tricky, slow, or ultimately unsuccessful data I/O can cripple efficiency right at the outset of a project. Conversely, reading and writing your data efficiently will make your R projects more likely to succeed in the outside world.

The first section introduces rio, a meta package for efficiently reading and writing data in a range of file formats. rio requires only two intuitive functions for data I/O, making it efficient to learn and use. Next, we explore in more detail efficient functions for reading files stored in common plain text file formats from the readr and data.table packages. Binary formats, which can dramatically reduce file sizes and read/write times, are covered next.

With the accelerating digital revolution and growth in open data, ...

Get Efficient R Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.