O'Reilly logo
  • Laith Al Obaidy thinks this is interesting:

To improve JOIN performance, here are some suggestions:

  • It is advisable to perform the JOIN operation on the biggest table first and then smaller tables
  • Join subsequent tables depending on which table has the most selective filter

From

Cover of Learning Cloudera Impala

Note

It is very important as there are a lot of cases that the jobs run out of memory due to a heavy JOINs operations.