An interesting architectural question is: once raw data flows from the raw data pond into the data pond, should the raw data remain in the raw data pond? The answer is no. Once raw data passes from the raw data pond to the analog data pond, the application data pond, or the textual data pond, it is best to remove the source data from the raw data pond. The raw data has already served its purpose and it would be extremely rare for analytical processing to ever be performed in the raw data pond. The raw data pond then becomes a “holding cell” for a jumble of data, as seen in Fig 4.2.
- Chapter 4 Data Ponds
- from Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump
- Publisher: Technics Publications
- Released: April 2016
Use of raw data pond
Share this highlighthttp://www.safaribooksonline.com/a/data-lake-architecture/7948792/