In the world of Big Data, it is customary for the information to be stored in a “schema on read” manner. In this system, the data is initially stored in a block of data. Then when a query is made against the data, the system goes and reads the block of data and determines the schema inside the block. By organizing data in this manner, very large amounts of data can be stored efficiently. However, by storing the data in a “schema on read” manner, the retrieval and analysis of the data can cause significant overhead for the system to bear. Every time data is accessed, all the data in the pond must be accessed in a “schema on read” organization of data.
- Chapter 5 Generic Structure of the Data Pond
- from Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump
- Publisher: Technics Publications
- Released: April 2016
Share this highlighthttp://www.safaribooksonline.com/a/data-lake-architecture/7948784/