Getting data out of Hadoop

We said that the data flow between Hadoop and a relational database is rarely a linear single direction process. Indeed the situation where data is processed within Hadoop and then inserted into a relational database is arguably the more common case. We will explore this now.

Writing data from within the reducer

Thinking about how to copy the output of a MapReduce job into a relational database, we find similar considerations as when looking at the question of data import into Hadoop.

The obvious approach is to modify a reducer to generate the output for each key and its associated values and then to directly insert them into a database via JDBC. We do not have to worry about source column partitioning, as with the import ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop: Data Processing and Modelling by Garry Turkington, Tanmay Deshpande, Sandeep Karanth

Getting data out of Hadoop

Writing data from within the reducer

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly