Sequence file

A sequence file is a flat file consisting of binary key/value pairs. They are extensively used in MapReduce (https://wiki.apache.org/hadoop/MapReduce) as input/output formats. They are mostly used for intermediate data storage within a sequence of MapReduce jobs. Sequence files work well as containers for small files. If there are too many small files in HDFS, they can be packed in a sequence file to make file processing efficient. There are three formats of sequence files: uncompressed, record compressed, and block compressed key/value records. Sequence files support block-level compression but do not support schema evolution.

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.