Chapter 4. Data storage on the batch layer

This chapter covers

Storage requirements for the master dataset
Distributed filesystems
Improving efficiency with vertical partitioning

In the last two chapters you learned about a data model for the master dataset and how you can translate that data model into a graph schema. You saw the importance of making data immutable and eternal. The next step is to learn how to physically store that data in the batch layer. Figure 4.1 recaps where we are in the Lambda Architecture.

Figure 4.1. The batch layer must structure large, continually growing datasets in a manner that supports low maintenance as well as efficient creation of the batch views.

Like the last two chapters, this chapter is dedicated ...

Get Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Big Data by Nathan Marz, James Warren

Chapter 4. Data storage on the batch layer

Figure 4.1. The batch layer must structure large, continually growing datasets in a manner that supports low maintenance as well as efficient creation of the batch views.

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly