4.1. Background and Terminology

In the “bad old days” of computing, roughly contemporaneous with the use of punched cards, there was a proliferation of character sets. Fortunately, those days are largely forgotten after the emergence of ASCII in the 1970s.

ASCII stands for American Standard Code for Information Interchange. It was a big step forward, but the operant word here is American. It was never designed to handle even European languages much less Asian ones.

But there were loopholes. This character set had 128 characters (being a 7-bit code). But an 8-bit byte was standard; how could we waste that extra bit? The natural idea is to make a superset of ASCII, using the codes 128 through 255 for other purposes. The trouble is, this was done ...

Get The Ruby Way: Solutions and Techniques in Ruby Programming, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.