Chapter 7The Data Lake

There is a major industry change happening with respect to how organizations store, manage, and analyze data. Not since the introduction of the data warehouse in the late 1980s have we seen something with the potential to transform how organizations leverage data and analytics to power their key business initiatives and rewire their value creation processes. This new data and analytics architecture is called the data lake, and it has potential to be even more impactful than the data warehouse in transforming the way organizations integrate data and analytics into their business models. But as in all things related to big data, organizations must “think differently” with respect to how they design, deploy, and manage their data architecture.

Today's data warehouses are extremely expensive. As a result, most organizations limit how much data they store in their data warehouse, opting for 13 to 25 months of summarized data versus 15 to 25 years of detailed transactional and operational data. Unfortunately for data warehouses, it is in that detailed transactional, operational, sensor, wearable, social data and the growing body of internal and external unstructured data that actionable insights about your customers, products, campaigns, partners, and operations can be found.

For example, over the past 15 years, the US economy has gone through two full economic cycles where the economy was flying high, collapsed, and then recovered. By looking at each of their ...

Get Big Data MBA now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.