O'Reilly logo

Tera-Tom Genius Series - Hadoop Architecture and SQL by Jason Nolander, Tom Coffing

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6 – Join Functions

“When spider webs unite they can tie up a lion.”

- African Proverb

Hadoop Joins

1.Each Table in Hadoop could hold a portion of a table.

2.For two rows to be Joined together, Hadoop insists that both rows are physically on the same reducer.

3.Hadoop will either Redistribute one or both of the tables or Duplicate the smaller table across all reducers to ensure matching rows are on the same reducer. This is done only for the life of the Join.

image

The three options are a Shuffle, Map or Sort Merge-Bucket Join

Two joining rows have to be in the same memory of a single reducer. The three options are a Shuffle join, Map join ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required