When working with data that has a highly uneven distribution, data skew could happen in such a way that a small number of compute nodes must handle the bulk of the computation. The following setting informs Hive to optimize properly if data skew happens:
> SET hive.optimize.skewjoin=true; --If there is data skew in join, set it to true. Default is false.> SET hive.skewjoin.key=100000; --This is the default value. If the number of key is bigger than --this, the new keys will send to the other unused reducers.