Summary
Hopefully, this chapter presented the topic of data life cycle management as something other than a dry abstract concept. We covered a lot, particularly:
- The definition of data life cycle management and how it covers a number of issues and techniques that usually become important with large data volumes
- The concept of building a data ingest pipeline along good data life cycle management principles that can then be utilized by higher-level analytic tools
- Oozie as a Hadoop-focused workflow manager and how we can use it to compose a series of actions into a unified workflow
- Various Oozie tools, such as subworkflows, parallel action execution, and global variables, that allow us to apply true design principles to our workflows
- HCatalog and how ...
Get Learning Hadoop 2 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.