Chapter 8. Oozie Bundles

Leading up to this chapter, we have covered two important and basic Oozie concepts, namely the workflow and the coordinator, and everything that goes into authoring and implementing them. Workflows are at the core of any Oozie application and coordinators are the next level of abstraction that allows the orchestration of these workflows through time and data triggers, as explained in Chapters 6 and 7. In this chapter, we will cover Oozie bundles, the highest level of abstraction in Oozie that helps users package a bunch of coordinator applications into a single entity, often called a data pipeline.

Bundle Basics

Oozie’s evolutionary path gives us a lot of context on how bundles were born. Oozie version 1.0 was all about workflows and the basic features around it. Version 2.0 introduced coordinators and triggers. Bundle became the next step for Oozie and was introduced in version 3.0. As you can see, there is a nice rhythm to this evolutionary arc and users wanted higher abstractions and more features for a Hadoop-based workflow engine at every stage. Bundle was the direct result of users wanting Oozie to support large data pipelines involving many workflows with complex interdependencies.

Bundle Definition

An Oozie bundle is a collection of Oozie coordinator applications with a directive on when to kick off those coordinators. As with the other parts of Oozie, bundles are also defined via an XML-based language called the Bundle Specification Language. Bundles ...

Get Apache Oozie now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.