The metadata is dependent on the data that exists outside the pond and the physical organization of the pond itself. If the data is stored in a standard DBMS outside the pond, many (or all) of those characteristics will be carried inside. In this case, the analyst can expect to find the same records, attributes, keys, and indexes. But if the data is stored in document form outside the data pond, then the analyst can expect to find the data organized in a document by document organization. Even in the case of data stored in a “schema on read” system, metadata is still needed. However the data is physically organized inside the pond, it will be described by metadata. Without the metadata descriptions, the analyst would have a hard time ...
- Chapter 5 Generic Structure of the Data Pond
- from Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump
- Publisher: Technics Publications
- Released: April 2016
Share this highlighthttp://www.safaribooksonline.com/a/data-lake-architecture/7948783/