“When spider webs unite they can tie up a lion.”
- African Proverb
1.Each Table in Hadoop could hold a portion of a table.
2.For two rows to be Joined together, Hadoop insists that both rows are physically on the same reducer.
3.Hadoop will either Redistribute one or both of the tables or Duplicate the smaller table across all reducers to ensure matching rows are on the same reducer. This is done only for the life of the Join.
The three options are a Shuffle, Map or Sort Merge-Bucket Join
Two joining rows have to be in the same memory of a single reducer. The three options are a Shuffle join, Map join ...