Mastering Algorithms with C

Description of LZ77

LZ77 ( Lempel-Ziv-1977) is a simple but surprisingly effective form of data compression that takes an entirely different approach from Huffman coding. LZ77 is a dictionary-based method, which means that it tries to compress data by encoding long strings of symbols, called phrases, as small tokens that reference entries in a dictionary. Compression is achieved by using relatively small tokens in place of longer phrases that appear several times in the data. As with Huffman coding, it is important to realize that a symbol is not necessarily a character of text: a symbol can be any amount of data we choose, but it is often one byte’s worth.

Maintaining a Dictionary of Phrases

Different dictionary-based compression methods use various approaches for maintaining their dictionaries. LZ77 uses a look-ahead buffer and a sliding window . LZ77 works by first loading a portion of the data into the look-ahead buffer. To understand how the look-ahead buffer stores phrases that effectively form a dictionary, picture the buffer as a sequence of symbols s ₁, . . . , s_n , and Pb as a set of phrases constructed from the symbols. From the sequence s ₁, . . . , s_n , we form n phrases, defined as:

P_b = {(s₁), (s₁, s₂), . . . ,(s₁, . . . ,s_n )}

This means that if the look-ahead buffer contains the symbols (A, B, D), for example, the phrases in the buffer are {(A), (A, B), (A, B, D)}. Once data passes through the look-ahead buffer, it moves into the sliding window and becomes part ...

Get Mastering Algorithms with C now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mastering Algorithms with C by Kyle Loudon

Description of LZ77

Maintaining a Dictionary of Phrases

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly