4.7. Representing Keys (or Other Binary Data) as English Text

Problem

You want to use an easy-to-read format for displaying keys (or fingerprints or some other interesting binary data). English would work better than a hexadecimal representation because people’s ability to recognize the key as correct by sight will be better.

Solution

Map a particular number of bits to a dictionary of words. The dictionary should be of such a size that an exact mapping to a number of bits is possible. That is, the dictionary should have a number of entries that is a power of two.

Discussion

The spc_bin2words() function shown here converts a binary string of the specified number of bytes into a string of English words. This function takes two arguments: str is the binary string to convert, and len is the number of bytes to be converted.

#include <string.h>
#include <stdlib.h>
#include "wordlist.h"
   
#define BITS_IN_LIST 11
#define MAX_WORDLEN  4
   
/* len parameter is measured in bytes.  Remaining bits padded with 0. */
unsigned char *spc_bin2words(const unsigned char *str, size_t len) {
  short         add_space = 0;
  size_t        i, leftbits, leftovers, scratch = 0, scratch_bits = 0;
  unsigned char *p, *res;
   
  res = (unsigned char *)malloc((len * 8 / BITS_IN_LIST + 1) * (MAX_WORDLEN + 1));
  if (!res) abort(  );
  res[0] = 0;
   
  for (i = 0;  i < len;  i++) {
    leftovers = str[i];
    leftbits = 8;
    while (leftbits) {
      if (scratch_bits + leftbits <= BITS_IN_LIST) {
        scratch |= (leftovers << (BITS_IN_LIST - leftbits - scratch_bits));
        scratch_bits += leftbits;
        leftbits = 0;
      } else {
        scratch |= (leftovers >> (leftbits - (BITS_IN_LIST - scratch_bits)));
        leftbits -= (BITS_IN_LIST - scratch_bits);
        leftovers &= ((1 << leftbits) - 1);
        scratch_bits = BITS_IN_LIST;
      }
      if (scratch_bits =  = BITS_IN_LIST) {
        p = words[scratch];
        /* The strcats are a bit inefficient because they start from the front of
         * the string each time.  But, they're less confusing, and these strings
         * should never get more than a few words long, so efficiency will
         * probably never be a real concern.
         */
        if (add_space) strcat(res, " ");
        strcat(res, p);
        scratch = scratch_bits = 0;
        add_space = 1;
      }
    }
  }
  if (scratch_bits) { /* Emit the final word */
    p = words[scratch];
    if (add_space) strcat(res, " ");
    strcat(res, p);
  }
  res = (unsigned char *)realloc(res, strlen(res) + 1);
  if (!res) abort(  ); /* realloc failed; should never happen, as size shrinks */
  return res;
}

To save space, the dictionary file (wordlist.h) is not provided here. Instead, you can find it on the book’s web site.

Warning

The previous code is subtly incompatible with the S/KEY dictionary because their dictionary is not in alphabetical order. (S/KEY is an authentication system using one-time passwords.) Be sure to use the right dictionary!

The code is written in such a way that you can use dictionaries of different sizes if you wish to encode a different number of bits per word. Currently, the dictionary encodes 11 bits of data (by having exactly 211 words), where no word is more than 4 characters long. The web site also provides a dictionary that encodes 13 bits of data, where no word is more than 6 letters long. The previous code can be modified to use the larger dictionary simply by changing the two appropriate preprocessor definitions at the top.

The algorithm takes 11 bits of the binary string, then finds the word that maps to the unique 11-bit value. Note that it is rare for the number of bits represented by a single word to align exactly to a byte. For example, if you were to encode a 2-byte binary string, those 16 bits would be encoded by 2 words, which could represent up to 22 bits. Therefore, there will usually be leftover bits. In the case of 2 bytes, there are 6 leftover bits. The algorithm sets all leftover bits to 0.

Because of this padding scheme, the output doesn’t always encode how many bytes were in the input. For example, if the output is 6 words long, the input could have been either 7 or 8 bytes long. Therefore, you need to manually truncate the output to the desired length.

See Also

Recipe 4.8

Get Secure Programming Cookbook for C and C++ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.