Cover by Kathleen Ting, Jarek Jarcec Cecho

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

Chapter 3. Incremental Import

So far we’ve covered use cases where you had to transfer an entire table’s contents from the database into Hadoop as a one-time operation. What if you need to keep the imported data on Hadoop in sync with the source table on the relational database side? While you could obtain a fresh copy every day by reimporting all data, that would not be optimal. The amount of time needed to import the data would increase in proportion to the amount of additional data appended to the table daily. This would put an unnecessary performance burden on your database. Why reimport data that has already been imported? For transferring deltas of data, Sqoop offers the ability to do incremental imports.

Examples in this chapter use the table visits, which can be created by the script mysql.tables.sql described in Chapter 2.

Importing Only New Data

Problem

You have a database table with an INTEGER primary key. You are only appending new rows, and you need to periodically sync the table’s state to Hadoop for further processing.

Solution

Activate Sqoop’s incremental feature by specifying the --incremental parameter. The parameter’s value will be the type of incremental import. When your table is only getting new rows and the existing ones are not changed, use the append mode.

Incremental import also requires two additional parameters: --check-column indicates a column name that should be checked for newly appended data, and --last-value contains the last value that successfully imported ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required