Summary

Hopefully, this chapter presented the topic of data life cycle management as something other than a dry abstract concept. We covered a lot, particularly:

  • The definition of data life cycle management and how it covers a number of issues and techniques that usually become important with large data volumes
  • The concept of building a data ingest pipeline along good data life cycle management principles that can then be utilized by higher-level analytic tools
  • Oozie as a Hadoop-focused workflow manager and how we can use it to compose a series of actions into a unified workflow
  • Various Oozie tools, such as subworkflows, parallel action execution, and global variables, that allow us to apply true design principles to our workflows
  • HCatalog and how ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.