Defining Data Lake

In the preceding sections, we had a quick overview of how the traditional systems evolved over time and understood their shortcomings with respect to the newer forms of data. In this section, let us discover what a Data Lake is and how it addresses the gaps masquerading as opportunities.

A Data Lake has flexible definitions. At its core, it is a data storage and processing repository in which all of the data in an organization can be placed so that every internal and external systems', partners', and collaborators' data flows into it and insights spring out.

The following list details out in a nutshell what a Data Lake is:

  • Data Lake is a huge repository that holds every kind of data in its raw format until it is needed by anyone ...

Get Data Lake Development with Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.