Chapter 7. Running Sqoop Jobs

In this chapter, we will see how to run the Sqoop jobs from Oozie. Sqoop (SQL to Hadoop) is used to import and export data from different database systems on to the Hadoop platform.

In this chapter, we will:

  • Run Sqoop jobs from the command line
  • Create Oozie Workflow for Sqoop actions
  • Run Sqoop jobs from Coordinators

From the concept point of view, we will:

  • Understand the concept of HCatalog Datasets
  • Understand HCatalog Coordinator and EL functions

Chapter case study

Let's have a twist in the rainfall use case we solved in the previous chapter. Instead of getting CSV files for rainfall data, we need to import the rainfall data from MySQL database and then move on to processing.

As the first step of the analysis, we need to bring ...

Get Apache Oozie Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.