Chapter 11. Oozie Operations

We covered all the functional aspects of Oozie in Chapters 4 through 8. We learned how to write workflows, coordinators, and bundles, and mastered the fundamentals of Oozie. Chapters 9 and 10 covered advanced topics like security and developer extensions. In this final chapter, we will cover several operational aspects of Oozie. We will start with the details of the Oozie CLI tool and the REST API. We will look at the Oozie server and explore some tips on administering and tuning it for better stability and performance. We will also cover typical operational topics like retry and reprocessing of Oozie jobs. Last but not the least, we will look at debugging techniques and resolutions for some common failures. We will also sprinkle in a few topics that are useful but don’t quite fit in any of the previous chapters.

Oozie CLI Tool

The primary interface for managing and interacting with Oozie is oozie, the command-line utility that we have used throughout this book (e.g., to submit jobs, check their status, kill them, etc.). Internally, it actually uses Oozie’s web service (WS) API, which we will look at in detail in the next section. The CLI is available on the Oozie client node, which is also typically the Hadoop edge node with access to all the Hadoop ecosystem CLI clients and tools like Hadoop, Hive, Pig, Sqoop, and others. This edge node is also usually configured to talk to and reach the Hadoop cluster, Hive meta-store, and the Oozie server. The ...

Get Apache Oozie now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.