Special JOIN – MAPJOIN

The MAPJOIN statement means doing the JOIN operation only by map without the reduce job. The MAPJOIN statement reads all the data from the small table to memory and broadcasts to all maps. During the map phase, the JOIN operation is performed by comparing each row of data in the big table with small tables against the join conditions. Because there is no reduce needed, the JOIN performance is improved. When the hive.auto.convert.join setting is set to true, Hive automatically converts the JOIN to MAPJOIN at runtime if possible instead of checking the map join hint. In addition, MAPJOIN can be used for unequal joins to improve performance since both MAPJOIN and WHERE are performed in the map phase. The following is an example ...

Get Apache Hive Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.