Appendix C

Compression

Compression has been a recurring theme in this book, for image formats (such as GIF and PNG), command-line compression tools (such as bzip2 and gzip), and HTTP compression through gzip and deflate. Despite this diversity in how compression is used, the majority of implementations actually employ the same underlying algorithms. This appendix provides a brief examination of the most common compression methods.

THE LZW FAMILY

The origins of the widely used LZW family of compression algorithms have their roots in two papers written by Abraham Lempel and Jacob Ziv in 1977 and 1978. These papers defined the LZ77 and LZ78 algorithms, two contrasting approaches to achieve compression by spotting repeating patterns.

LZ77

LZ77 looks for patterns of repeating data in previously seen input and replaces the data with a reference to the first instance of the pattern. The reference takes the form of a length-offset pair, indicating the length of the pattern, and it’s offset (from the current position). Consider the following stream of data:

ABCDEFXYZABCD

The second instance of ABCD can be replaced with a length-offset pair, like so:

ABCDEFXYZ(4,9)

Here, the number 4 indicates the length of the pattern, and the number 9 indicates the offset (nine characters back from the current position).

Sliding Window

In many cases, it would be impractical for LZ77 to search the whole stream of previous data for matches. It could potentially be hundreds of megabytes, and the CPU and memory ...

Get Professional Website Performance: Optimizing the Front-End and Back-End now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.