Chapter 9. Advanced Topics

In the previous chapters, we largely focused on Oozie’s three abstractions: workflow, coordinator, and bundle. In particular, we explained the basic and common usage of those abstractions. In this chapter, we discuss some of the advanced concepts concerning the workflow and the coordinator. More specifically, we present how to manage JARs for Oozie workflows and how to execute MapReduce jobs written using the new Hadoop API. We also elaborate on the security features in Oozie. As for the coordinator, we demonstrate how to use cron type scheduling and how to support HCatalog-based data dependency.

Managing Libraries in Oozie

In general, managing different JARs while allowing users the flexibility to include their own custom JARs for their applications is a challenge for any Java-based system. In the previous chapters, we briefly covered some simple examples of JAR management in Oozie. We will discuss a few other important scenarios in this section.

Origin of JARs in Oozie

Before going into the details of JAR management, let’s see the different types of JARs Oozie needs to maintain. The JARs in Oozie largely come from the following sources:

System JARs

This includes Oozie’s system JARs that run Oozie services. These JARs are generated during an Oozie build and included as part of the Oozie web application archive (oozie.war) file, as discussed in “Install Oozie Server”.

Hadoop JARs

These JARs are required for Oozie to communicate to Hadoop services. Hadoop ...

Get Apache Oozie now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.