Chapter 1. Understanding Metadata: Create the Foundation for a Scalable Data Architecture

Key Challenges of Building Next-Generation Data Architectures

Today’s technology and software advances allow us to process and analyze huge amounts of data. While it’s clear that big data is a hot topic, and organizations are investing a lot of money around it, it’s important to note that in addition to considering scale, we also need to take into account the variety of the types of data being analyzed. Data variety means that datasets can be stored in many formats and storage systems, each of which have their own characteristics. Taking data variety into account is a difficult task, but provides the benefit of having a 360-degree approach—enabling a full view of your customers, providers, and operations. To enable this 360-degree approach, we need to implement next-generation data architectures. In doing so, the main question becomes: how do you create an agile data platform that takes into account data variety and scalability of future data?

The answer for today’s forward-looking organizations increasingly relies on a data lake. A data lake is a single repository that manages transactional databases, operational stores, and data generated outside of the transactional enterprise systems, all in a common repository. The data lake supports data from different sources like files, clickstreams, IoT sensor data, social network data, and SaaS application data.

A core tenet of the data lake is ...

Get Understanding Metadata now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.