4.12. Encrypting in a Single Reduced Character Set

Problem

You’re storing data in a format in which particular characters are invalid. For example, you might be using a database, and you’d like to encrypt all the fields, but the database does not support binary strings. You want to avoid growing the message itself (sometimes database fields have length limits) and thus want to avoid encoding binary data into a representation like base64.

Solution

Encrypt the data using a stream cipher (or a block cipher in a streaming mode). Do so in such a way that you map each byte of output to a byte in the valid character set.

For example, let’s say that your character set is the 64 characters consisting of all uppercase and lowercase letters, the 10 numerical digits, the space, and the period. For each character, do the following:

  1. Map the input character to a number from 0 to 63.

  2. Take a byte of output from the stream cipher and reduce it modulo 64.

  3. Add the random byte and the character, reducing the result modulo 64.

  4. The result will be a value from 0 to 63. Map it back into the desired character set.

Decryption is done with exactly the same process.

See Recipe 5.2 for a discussion of picking a streaming cipher solution. Generally, we recommend using AES in CTR mode or the SNOW 2.0 stream cipher.

Discussion

If your character set is an 8-bit quantity per character (e.g., some subset of ASCII instead of Unicode or something like that), the following code will work:

typedef struct {
  unsigned char *cset;
  int           csetlen;
  unsigned char reverse[256];
  unsigned char maxvalid;
} ENCMAP; 
   
#define decrypt_within_charset encrypt_within_charset
   
void setup_charset_map(ENCMAP *s, unsigned char *charset, int csetlen) {
  int i;
   
  s->cset    = charset;
  s->csetlen = csetlen;
   
  for (i = 0;  i < 256;  i++) s->reverse[i] = -1;
  for (i = 0;  i < csetlen;  i++) s->reverse[charset[i]] = i;
  s->maxvalid = 255 - (256 % csetlen);
}
   
void encrypt_within_charset(ENCMAP *s, unsigned char *in, long inlen, 
                            unsigned char *out, unsigned char (*keystream_byte)(  )) {
  long          i;
  unsigned char c;
   
  for (i = 0;  i < inlen;  i++) {
    do {
      c = (*keystream_byte)(  );
    } while(c > s->maxvalid);
    *out++ = s->cset[(s->reverse[*in++] + c) % s->csetlen];
  }
}

The function setup_charset_map( ) must be called once to set up a table that maps ASCII values into an index of the valid subset of characters. The data type that stores the mapping data is ENCMAP . The other two arguments are charset, a list of all characters in the valid subset, and csetlen, which specifies the number of characters in that set.

Once the character map is set up, you can call encrypt_within_charset( ) to encrypt or decrypt data, while staying within the specified character set. This function has the following arguments:

s

Pointer to the ENCMAP object.

in

Buffer containing the data to be encrypted or decrypted.

inlen

Length in bytes of the input buffer.

out

Buffer into which the encrypted or decrypted data is placed.

keystream_byte

Pointer to a callback function that should return a single byte of cryptographically strong keystream.

This code needs to know how to get more bytes of keystream on demand, because some bytes of keystream will be thrown away if they could potentially be leveraged in a statistical attack. Therefore, the amount of keystream necessary is theoretically unbounded (though in practice it should never be significantly more than twice the length of the input). As a result, we need to know how to invoke a function that gives us new keystream instead of just passing in a buffer of static keystream.

It would be easy (and preferable) to extend this code example to use a cipher context object (keyed and in a streaming mode) as a parameter instead of the function pointer. Then you could get the next byte of keystream directly from the passed context object. If your crypto library does not allow you direct access to keystream, encrypting all zeros returns the original keystream.

Warning

Remember to use a MAC anytime you encrypt, even though this expands your message length. The MAC is almost always necessary for security! For databases, you can always base64-encode the MAC output and stick it in another field. (See Recipe 6.9 for how to MAC data securely.)

Note that encrypt_within_charset( ) can be used for both encryption and decryption. For clarity’s sake, we alias decrypt_within_charset( ) using a macro.

The previous code works for fixed-size wide characters if you operate on the appropriate sized values, even though we only operate on single characters. As written, however, our code isn’t useful for variable-byte character sets. With such data, we recommend that you accept a solution that involves message expansion, such as encrypting, then base64-encoding the result.

Get Secure Programming Cookbook for C and C++ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.