Ruby’s Support for Character Encodings

Ruby’s support for character encodings has progressed slowly but surely in recent versions. Explaining its support for encodings in the latest version is best done by taking a journey from Ruby 1.8, which had no support for any character encodings apart from US-ASCII, to the rather robust support you can find in the latest version of Ruby.

Ruby 1.8

In Ruby 1.8, released in 2003, there was essentially no support for character encodings at all. Source files were always interpreted as US-ASCII, and methods that operated on strings would often get confused when they encountered multi-byte characters. For example, look at the following code when run using Ruby 1.8:

 
"Hellø"​.reverse ​# => "\270\303lleH"

Get Text Processing with Ruby now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.