MAD Skills and Cosmos

In "MAD Skills: New Analysis Practices for Big Data," a paper from the 2009 VLDB conference, the analysis environment at Fox Interactive Media (FIM) is described in detail. Using a combination of Hadoop and the Greenplum database system, the team at FIM has built a familiar platform for data processing in isolation from our work at Facebook.

The paper's title refers to three tenets of the FIM platform: Magnetic, Agile, and Deep. "Magnetic" refers to the desire to store all data from the enterprise, not just the structured data that fits into the enterprise data model. Along the same lines, an "Agile" platform should handle schema evolution gracefully, enabling analysts to work with data immediately and evolve the data model as needed. "Deep" refers to the practice of performing more complex statistical analyses over data.

In the FIM environment, data is separated into staging, production, reporting, and sandbox schemas within a single Greenplum database, quite similar to the multiple tiers inside of Hadoop at Facebook described earlier.

Separately, Microsoft has published details of its data management stack. In papers titled "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks" and "SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets," Microsoft describes an information platform remarkably similar to the one we had built at Facebook. Its infrastructure includes a distributed filesystem called Cosmos and a system for parallel ...

Get Beautiful Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.