Scheduling

The Hadoop system can have multiple jobs and these have to be scheduled many times. Hadoop jobs' scheduling is complex and difficult to create, manage, and monitor. We can use a system such as Oozie to coordinate and monitor Hadoop jobs efficiently, as mentioned next:

  • Apache Oozie: Oozie is a workflow and coordination service processing system that lets the users manage multiple jobs as well as chain of jobs written in MapReduce, Pig, and Hive, also java programs and shell sripts too, and can link them to one another. Oozie is an extensible, scalable, and data-aware service. Oozie can be used to set rules for beginning and ending a workflow and it can also detect the completion of tasks.

Get Hadoop Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.