Reading/Writing a Different Character Set

Problem

You need to read or write a text file using a particular encoding.

Solution

Convert the text to or from internal Unicode by specifying a converter when you construct an InputStreamReader or PrintWriter.

Discussion

Classes InputStreamReader and OutputStreamWriter are the bridge from byte-oriented Streams to character-based Readers. These classes read or write bytes and translate them to or from characters according to a specified character encoding. The Unicode character set used inside Java (char and String types) is a 16-bit character set. But most character sets, such as ASCII, Swedish, Spanish, Greek, Turkish, and many others, use only a small subset of that. In fact, many European language character sets fit nicely into 8-bit characters. Even the larger character sets (script-based and pictographic languages) don’t all use the same bit values for each particular character. The encoding , then, is a mapping between Unicode characters and a particular external storage format for characters drawn from a particular national or linguistic character set.

To simplify matters, the InputStreamReader and OutputStreamWriter constructors are the only places where you can specify the name of an encoding to be used in this translation. If you do not, the platform’s (or user’s) default encoding will be used. PrintWriters, BufferedReaders, and the like all use whatever encoding the InputStreamReader or OutputStreamWriter class uses. Since ...

Get Java Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.