The importance of metadata

Before we try to understand the importance of Metadata, let's try to understand what metadata is. Metadata is simply data about data. This sounds confusing as we are defining the definition in a recursive way.

In a typical big data system, we have these three levels of verticals:

  • Applications writing data to a big data system
  • Organizing data within the big data system
  • Applications consuming data from the big data system

This brings up a few challenges as we are talking about millions (even billions) of data files/segments that are stored in the big data system. We should be able to correctly identify the ownership, usage of these data files across the Enterprise.

Let's take an example of a TV broadcasting company ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.