Chapter 4. Data storage on the batch layer

This chapter covers

  • Storage requirements for the master dataset
  • Distributed filesystems
  • Improving efficiency with vertical partitioning

In the last two chapters you learned about a data model for the master dataset and how you can translate that data model into a graph schema. You saw the importance of making data immutable and eternal. The next step is to learn how to physically store that data in the batch layer. Figure 4.1 recaps where we are in the Lambda Architecture.

Figure 4.1. The batch layer must structure large, continually growing datasets in a manner that supports low maintenance as well as efficient creation of the batch views.

Like the last two chapters, this chapter is dedicated ...

Get Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.