Map Reduce program to partition data using a custom partitioner

In this recipe, we are going to learn how to write a map reduce program to partition data using a custom partitioner.

Getting ready

To perform this recipe, you should have a running Hadoop cluster running as well as an eclipse that's similar to an IDE.

How to do it...

During the shuffle and sort, if it's not specified, Hadoop by default uses a hash partitioner. We can also write our own custom partitioner with custom partitioning logic, such that we can partition the data into separate files.

Let's consider one example where we have user data with us along with the year of joining. Now, assume that we have to partition the users based on the year of joining that's specified in the record. ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.