In the late 1940s, Claude Shannon was working on encoding problems at Bell Labs. More specifically, Shannon was interested in establishing reliable transmission of information between a sender S and a recipient R. Think of S as randomly generating messages composed of sequences of words. The fundamental question Shannon was trying to solve is: what is the optimal encoding that makes the encoded messages as short as possible and what is the average length of these encoded words?
The intuition behind the solution that he found is easy to grasp:
Frequent words should be encoded with short strings, whereas infrequent words can be encoded with longer strings.
In this section, we will show how the randomness of messages that S generates is actually related to the amount of information contained in those messages.
Remember that the three forms of information theory complexity that we review in this appendix are all defined as the value of information contained in binary strings once a goal has been specified for what to do with these strings. For the case at hand, Shannon entropy, the goal is to compress the random messages that S sends, as much as possible.
Let us be just a little bit more precise here. Assume that the sender S uses only a finite number N of words that we denote wi where the index i runs from 1 to N. Assume, moreover, that the words wi are generated with probabilities pi (numbers that sum up to 1). ...