You are previewing Apache Sqoop Cookbook.

Apache Sqoop Cookbook

Cover of Apache Sqoop Cookbook by Jarek Jarcec Cecho... Published by O'Reilly Media, Inc.
  1. Apache Sqoop Cookbook
  2. Foreword
  3. Preface
    1. Sqoop 2
    2. Conventions Used in This Book
    3. Using Code Examples
    4. Safari® Books Online
    5. How to Contact Us
    6. Acknowledgments
      1. Jarcec Thanks
      2. Kathleen Thanks
  4. 1. Getting Started
    1. Downloading and Installing Sqoop
      1. Problem
      2. Solution
      3. Discussion
    2. Installing JDBC Drivers
      1. Problem
      2. Solution
      3. Discussion
    3. Installing Specialized Connectors
      1. Problem
      2. Solution
      3. Discussion
    4. Starting Sqoop
      1. Problem
      2. Solution
      3. Discussion
    5. Getting Help with Sqoop
      1. Problem
      2. Solution
      3. Discussion
  5. 2. Importing Data
    1. Transferring an Entire Table
      1. Problem
      2. Solution
      3. Discussion
    2. Specifying a Target Directory
      1. Problem
      2. Solution
      3. Discussion
    3. Importing Only a Subset of Data
      1. Problem
      2. Solution
      3. Discussion
    4. Protecting Your Password
      1. Problem
      2. Solution
      3. Discussion
    5. Using a File Format Other Than CSV
      1. Problem
      2. Solution
      3. Discussion
    6. Compressing Imported Data
      1. Problem
      2. Solution
      3. Discussion
    7. Speeding Up Transfers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    8. Overriding Type Mapping
      1. Problem
      2. Solution
      3. Discussion
    9. Controlling Parallelism
      1. Problem
      2. Solution
      3. Discussion
    10. Encoding NULL Values
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    11. Importing All Your Tables
      1. Problem
      2. Solution
      3. Discussion
  6. 3. Incremental Import
    1. Importing Only New Data
      1. Problem
      2. Solution
      3. Discussion
    2. Incrementally Importing Mutable Data
      1. Problem
      2. Solution
      3. Discussion
    3. Preserving the Last Imported Value
      1. Problem
      2. Solution
      3. Discussion
    4. Storing Passwords in the Metastore
      1. Problem
      2. Solution
      3. Discussion
    5. Overriding the Arguments to a Saved Job
      1. Problem
      2. Solution
      3. Discussion
    6. Sharing the Metastore Between Sqoop Clients
      1. Problem
      2. Solution
      3. Discussion
  7. 4. Free-Form Query Import
    1. Importing Data from Two Tables
      1. Problem
      2. Solution
      3. Discussion
    2. Using Custom Boundary Queries
      1. Problem
      2. Solution
      3. Discussion
    3. Renaming Sqoop Job Instances
      1. Problem
      2. Solution
      3. Discussion
    4. Importing Queries with Duplicated Columns
      1. Problem
      2. Solution
      3. Discussion
  8. 5. Export
    1. Transferring Data from Hadoop
      1. Problem
      2. Solution
      3. Discussion
    2. Inserting Data in Batches
      1. Problem
      2. Solution
      3. Discussion
    3. Exporting with All-or-Nothing Semantics
      1. Problem
      2. Solution
      3. Discussion
    4. Updating an Existing Data Set
      1. Problem
      2. Solution
      3. Discussion
    5. Updating or Inserting at the Same Time
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. Using Stored Procedures
      1. Problem
      2. Solution
      3. Discussion
    7. Exporting into a Subset of Columns
      1. Problem
      2. Solution
      3. Discussion
    8. Encoding the NULL Value Differently
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    9. Exporting Corrupted Data
      1. Problem
      2. Solution
      3. Discussion
  9. 6. Hadoop Ecosystem Integration
    1. Scheduling Sqoop Jobs with Oozie
      1. Problem
      2. Solution
      3. Discussion
    2. Specifying Commands in Oozie
      1. Problem
      2. Solution
      3. Discussion
    3. Using Property Parameters in Oozie
      1. Problem
      2. Solution
      3. Discussion
    4. Installing JDBC Drivers in Oozie
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    5. Importing Data Directly into Hive
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. Using Partitioned Hive Tables
      1. Problem
      2. Solution
      3. Discussion
    7. Replacing Special Delimiters During Hive Import
      1. Problem
      2. Solution
      3. Discussion
    8. Using the Correct NULL String in Hive
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    9. Importing Data into HBase
      1. Problem
      2. Solution
      3. Discussion
    10. Importing All Rows into HBase
      1. Problem
      2. Solution
      3. Discussion
    11. Improving Performance When Importing into HBase
      1. Problem
      2. Solution
      3. Discussion
  10. 7. Specialized Connectors
    1. Overriding Imported boolean Values in PostgreSQL Direct Import
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    2. Importing a Table Stored in Custom Schema in PostgreSQL
      1. Problem
      2. Solution
      3. Discussion
    3. Exporting into PostgreSQL Using pg_bulkload
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    4. Connecting to MySQL
      1. Problem
      2. Solution
      3. Discussion
    5. Using Direct MySQL Import into Hive
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. Using the upsert Feature When Exporting into MySQL
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    7. Importing from Oracle
      1. Problem
      2. Solution
      3. Discussion
    8. Using Synonyms in Oracle
      1. Problem
      2. Solution
      3. Discussion
    9. Faster Transfers with Oracle
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    10. Importing into Avro with OraOop
      1. Problem
      2. Solution
      3. Discussion
    11. Choosing the Proper Connector for Oracle
      1. Problem
      2. Solution
      3. Discussion
    12. Exporting into Teradata
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    13. Using the Cloudera Teradata Connector
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    14. Using Long Column Names in Teradata
      1. Problem
      2. Solution
      3. Discussion
  11. About the Authors
  12. Colophon
  13. Copyright
O'Reilly logo

Chapter 6. Hadoop Ecosystem Integration

The previous chapters described the various use cases where Sqoop enables highly efficient data transfers between Hadoop and relational databases. This chapter will focus on integrating Sqoop with the rest of the Hadoop ecosystem: we will show you how to run Sqoop from within a specialized Hadoop scheduler named Oozie and how to load your data into Hadoop’s data warehouse system, Apache Hive, and Hadoop’s database, Apache HBase.

Scheduling Sqoop Jobs with Oozie

Problem

You are using Oozie in your environment to schedule Hadoop jobs and would like to call Sqoop from within your existing workflows.

Solution

Oozie includes special Sqoop actions that you can use to call Sqoop in your workflow. For example:

<workflow-app name="sqoop-workflow" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="sqoop-action">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>foo:8021</job-tracker>
            <name-node>bar:8020</name-node>
            <command>import --table cities --connect ...</command>
        </sqoop>
        <ok to="next"/>
        <error to="error"/>
    </action>
    ...
</workflow-app>

Discussion

Starting from version 3.2.0, Oozie has built-in support for Sqoop. You can use the special action type in the same way you would execute a MapReduce action. You have two options for specifying Sqoop parameters. The first option is to use one tag, <command>, to list all the parameters, for example:

<command>import --table cities --username sqoop --password sqoop ...</command>

In this case, Oozie will take ...

The best content for your career. Discover unlimited learning on demand for around $1/day.