Compression techniques in Hive can significantly reduce the amount of data transferring between mappers and reducers by properly compressing intermediate and final output data. As a result, the query will have better performance. To compress intermediate files produced between multiple MapReduce jobs, we need to set the following property (false by default) in the command-line session or the hive-site.xml file:
> SET hive.exec.compress.intermediate=true
Then, we need to decide which compression codec to configure. A list of commonly supported codecs is in the following table:
Compression | Codec | Extension | Splittable |
Deflate | org.apache.hadoop.io.compress.DefaultCodec | .deflate | N |
Gzip | org.apache.hadoop.io.compress.GzipCodec ... |