At the lowest level, a Ruby program is simply a sequence of
characters. Ruby’s lexical rules are defined using characters of the ASCII character set. Comments begin with the
# character (ASCII
code 35), for example, and allowed whitespace characters are horizontal
tab (ASCII 9), newline (10), vertical tab (11), form feed (12), carriage
return (13), and space (32). All Ruby keywords are written using ASCII
characters, and all operators and other punctuation are drawn from the
ASCII character set.
By default, the Ruby interpreter assumes that Ruby source code is encoded in ASCII. This is not required, however; the interpreter can also process files that use other encodings, as long as those encodings can represent the full set of ASCII characters. In order for the Ruby interpreter to be able to interpret the bytes of a source file as characters, it must know what encoding to use. Ruby files can identify their own encodings or you can tell the interpreter how they are encoded. Doing so is explained shortly.
The Ruby interpreter is actually quite flexible about the characters that appear in a Ruby program. Certain ASCII characters have specific meanings, and certain ASCII characters are not allowed in identifiers, but beyond that, a Ruby program may contain any characters allowed by the encoding. We explained earlier that identifiers may contain characters outside of the ASCII character set. The same is true for comments and string and regular expression literals: they ...