Encoding a string using a Huffman tree

A Huffman tree allows efficient encoding of data by calculating a probability distribution of characters to optimize the space taken per character. Imagine compressing this book into one piece of paper and back without any information loss. Huffman trees allow this type of optimal lossless data compression based on statistics.

In this recipe, we will implement a Huffman tree from a source of text and produce a string representation of its Huffman codes.

For example, the string "hello world" contains 11 characters, which, depending on the encoding and architecture, may take up as few as 11 bytes of space to represent. The code in this recipe will transform the string into just 51 bits, or 6.375 bytes.

Getting ...

Get Haskell Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.