Compression codecs
Codecs (coder/decoders) are used to compress and decompress data using various compression algorithms. gzip
, bzip2
, lzo
, and snappy
are supported by Flume, although you may have to install lzo yourself, especially if you are using a distribution such as CDH due to licensing issues.
If you want to specify compression for your data, you set the hdfs.codeC
property if you want the HDFS sink to write compressed files. The property is also used as the file suffix for the files written to HDFS. For example, if you specify the codec as follows all files written will have a .gzip
extension, so you don't need to specify a
hdfs.fileSuffix
property in this case:
agent.sinks.k1.hdfs.codeC=gzip
Which codec you choose to use will require some ...
Get Apache Flume: Distributed Log Collection for Hadoop now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.