Loading a subset of text files

Sometimes we only need some parts of the dataset for an analysis, stored in a database backend or in flat files. In such situations, loading only the relevant subset of the data frame will result in much more speed improvement compared to any performance tweaks and custom packages discussed earlier.

Let's imagine we are only interested in flights to Nashville, where the annual useR! conference took place in 2012. This means we need only those rows of the CSV file where the Dest equals BNA (this International Air Transport Association airport code stands for Nashville International Airport).

Instead of loading the whole dataset in 160 to 2,000 milliseconds (see the previous section) and then dropping the unrelated rows ...

Get Mastering Data Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.