Implementation and Analysis of LZ77

With LZ77, we try to compress data by encoding phrases from a look-ahead buffer as tokens referencing phrases in a sliding window. To uncompress the data, we decode each token into the phrase or symbol it represents. To do this, we must continually update the sliding window so that at any one time it looks the same as it did during the compression process. In the implementation presented here, a symbol in the original data is one byte.

lz77_compress

The lz77_compress operation (see Example 14.5) compresses data using LZ77. It begins by writing the number of symbols in the data to the buffer of compressed data and initializing the sliding window and look-ahead buffer. The look-ahead buffer is then loaded with symbols.

Compression takes place inside of a loop that iterates until there are no more symbols to process. We use ipos to keep track of the current byte being processed in the original data, and opos to keep track of the current bit we are writing to the buffer of compressed data. During each iteration of the loop, we call compare_win to determine the longest phrase in the look-ahead buffer that matches one in the sliding window. The compare_win function returns the length of the longest match.

When a match is found, compare_win sets offset to the position of the match in the sliding window and next to the symbol in the look-ahead buffer immediately after the match. In this case, we write a phrase token to the compressed data (see Figure 14.4 ...

Get Mastering Algorithms with C now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.