You are previewing Apache Sqoop Cookbook.

Apache Sqoop Cookbook

Cover of Apache Sqoop Cookbook by Jarek Jarcec Cecho... Published by O'Reilly Media, Inc.
  1. Apache Sqoop Cookbook
  2. Foreword
  3. Preface
    1. Sqoop 2
    2. Conventions Used in This Book
    3. Using Code Examples
    4. Safari® Books Online
    5. How to Contact Us
    6. Acknowledgments
      1. Jarcec Thanks
      2. Kathleen Thanks
  4. 1. Getting Started
    1. Downloading and Installing Sqoop
      1. Problem
      2. Solution
      3. Discussion
    2. Installing JDBC Drivers
      1. Problem
      2. Solution
      3. Discussion
    3. Installing Specialized Connectors
      1. Problem
      2. Solution
      3. Discussion
    4. Starting Sqoop
      1. Problem
      2. Solution
      3. Discussion
    5. Getting Help with Sqoop
      1. Problem
      2. Solution
      3. Discussion
  5. 2. Importing Data
    1. Transferring an Entire Table
      1. Problem
      2. Solution
      3. Discussion
    2. Specifying a Target Directory
      1. Problem
      2. Solution
      3. Discussion
    3. Importing Only a Subset of Data
      1. Problem
      2. Solution
      3. Discussion
    4. Protecting Your Password
      1. Problem
      2. Solution
      3. Discussion
    5. Using a File Format Other Than CSV
      1. Problem
      2. Solution
      3. Discussion
    6. Compressing Imported Data
      1. Problem
      2. Solution
      3. Discussion
    7. Speeding Up Transfers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    8. Overriding Type Mapping
      1. Problem
      2. Solution
      3. Discussion
    9. Controlling Parallelism
      1. Problem
      2. Solution
      3. Discussion
    10. Encoding NULL Values
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    11. Importing All Your Tables
      1. Problem
      2. Solution
      3. Discussion
  6. 3. Incremental Import
    1. Importing Only New Data
      1. Problem
      2. Solution
      3. Discussion
    2. Incrementally Importing Mutable Data
      1. Problem
      2. Solution
      3. Discussion
    3. Preserving the Last Imported Value
      1. Problem
      2. Solution
      3. Discussion
    4. Storing Passwords in the Metastore
      1. Problem
      2. Solution
      3. Discussion
    5. Overriding the Arguments to a Saved Job
      1. Problem
      2. Solution
      3. Discussion
    6. Sharing the Metastore Between Sqoop Clients
      1. Problem
      2. Solution
      3. Discussion
  7. 4. Free-Form Query Import
    1. Importing Data from Two Tables
      1. Problem
      2. Solution
      3. Discussion
    2. Using Custom Boundary Queries
      1. Problem
      2. Solution
      3. Discussion
    3. Renaming Sqoop Job Instances
      1. Problem
      2. Solution
      3. Discussion
    4. Importing Queries with Duplicated Columns
      1. Problem
      2. Solution
      3. Discussion
  8. 5. Export
    1. Transferring Data from Hadoop
      1. Problem
      2. Solution
      3. Discussion
    2. Inserting Data in Batches
      1. Problem
      2. Solution
      3. Discussion
    3. Exporting with All-or-Nothing Semantics
      1. Problem
      2. Solution
      3. Discussion
    4. Updating an Existing Data Set
      1. Problem
      2. Solution
      3. Discussion
    5. Updating or Inserting at the Same Time
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. Using Stored Procedures
      1. Problem
      2. Solution
      3. Discussion
    7. Exporting into a Subset of Columns
      1. Problem
      2. Solution
      3. Discussion
    8. Encoding the NULL Value Differently
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    9. Exporting Corrupted Data
      1. Problem
      2. Solution
      3. Discussion
  9. 6. Hadoop Ecosystem Integration
    1. Scheduling Sqoop Jobs with Oozie
      1. Problem
      2. Solution
      3. Discussion
    2. Specifying Commands in Oozie
      1. Problem
      2. Solution
      3. Discussion
    3. Using Property Parameters in Oozie
      1. Problem
      2. Solution
      3. Discussion
    4. Installing JDBC Drivers in Oozie
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    5. Importing Data Directly into Hive
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. Using Partitioned Hive Tables
      1. Problem
      2. Solution
      3. Discussion
    7. Replacing Special Delimiters During Hive Import
      1. Problem
      2. Solution
      3. Discussion
    8. Using the Correct NULL String in Hive
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    9. Importing Data into HBase
      1. Problem
      2. Solution
      3. Discussion
    10. Importing All Rows into HBase
      1. Problem
      2. Solution
      3. Discussion
    11. Improving Performance When Importing into HBase
      1. Problem
      2. Solution
      3. Discussion
  10. 7. Specialized Connectors
    1. Overriding Imported boolean Values in PostgreSQL Direct Import
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    2. Importing a Table Stored in Custom Schema in PostgreSQL
      1. Problem
      2. Solution
      3. Discussion
    3. Exporting into PostgreSQL Using pg_bulkload
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    4. Connecting to MySQL
      1. Problem
      2. Solution
      3. Discussion
    5. Using Direct MySQL Import into Hive
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. Using the upsert Feature When Exporting into MySQL
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    7. Importing from Oracle
      1. Problem
      2. Solution
      3. Discussion
    8. Using Synonyms in Oracle
      1. Problem
      2. Solution
      3. Discussion
    9. Faster Transfers with Oracle
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    10. Importing into Avro with OraOop
      1. Problem
      2. Solution
      3. Discussion
    11. Choosing the Proper Connector for Oracle
      1. Problem
      2. Solution
      3. Discussion
    12. Exporting into Teradata
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    13. Using the Cloudera Teradata Connector
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    14. Using Long Column Names in Teradata
      1. Problem
      2. Solution
      3. Discussion
  11. About the Authors
  12. Colophon
  13. Copyright
O'Reilly logo

Chapter 2. Importing Data

The next few chapters, starting with this one, are devoted to transferring data from your relational database or warehouse system to the Hadoop ecosystem. In this chapter we will cover the basic use cases of Sqoop, describing various situations where you have data in a single table in a database system (e.g., MySQL or Oracle) that you want to transfer into the Hadoop ecosystem.

We will be describing various Sqoop features through examples that you can copy and paste to the console and then run. In order to do so, you will need to first set up your relational database. For the purpose of this book, we will use a MySQL database with the account sqoop and password sqoop. We will be connecting to a database named sqoop. You can easily create the credentials using the script mysql.credentials.sql uploaded to the GitHub project associated with this book.

You can always change the examples if you want to use different credentials or connect to a different relational system (e.g., Oracle, PostgreSQL, Microsoft SQL Server, or any others). Further details will be provided later in the book. As Sqoop is focused primarily on transferring data, we need to have some data already available in the database before running the Sqoop commands. To have something to start with, we’ve created the table cities containing a few cities from around the world (see Table 2-1). You can use the script mysql.tables.sql from the aforementioned GitHub project to create and populate all tables ...

The best content for your career. Discover unlimited learning on demand for around $1/day.