O'Reilly logo

Learning Hadoop 2 by Garry Turkington, Gabriele Modena

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summary

Hopefully, this chapter presented the topic of data life cycle management as something other than a dry abstract concept. We covered a lot, particularly:

  • The definition of data life cycle management and how it covers a number of issues and techniques that usually become important with large data volumes
  • The concept of building a data ingest pipeline along good data life cycle management principles that can then be utilized by higher-level analytic tools
  • Oozie as a Hadoop-focused workflow manager and how we can use it to compose a series of actions into a unified workflow
  • Various Oozie tools, such as subworkflows, parallel action execution, and global variables, that allow us to apply true design principles to our workflows
  • HCatalog and how ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required