Data pipelines

In real Big Data projects, the Coordinators are scheduled tasks that are part of the data pipeline. For example, get data from some system and process it (this forms one Coordinator), and then another sub process can send the processed data to a database (this forms another Coordinator). Finally, both of them are abstracted to form Bundle. To think in terms of how to solve your job using Oozie, start by drawing the job Workflow on a whiteboard/paper. Then discuss with your team how you can create unit abstractions to run individually and in isolation.

Check out the following example.

The database has a record of daily rainfall in Melbourne. We import that data to Hadoop using a regular Coordinator job (Coordinator 1). Using another ...

Get Apache Oozie Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Apache Oozie Essentials by Jagat Jasjit Singh

Data pipelines

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly