Chapter 4. Working with Large Datasets

This chapter focuses more on the data than on its presentation. In the real world, data can grow, and it can grow quickly. Being able to work with large data sets and large files that contain the raw data can be a challenge. In this chapter we discuss version control, storage, performance, and benchmarking with large data sets in mind.

Git and Large Files

Git can handle whatever you ask it to handle, including binary files and very large files. Remote hosted Git services, on the other hand, may impose size restrictions. For example, GitHub and Stash limit files to 100 MB.

Before you commit all the things, here are a couple of points to consider. Data files may be a point-in-time snapshot, or only needed ...

Get Data Visualization Toolkit: Using JavaScript, Rails™, and Postgres to Present Data and Geospatial Information now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.