Converting Between Byte Arrays and Strings

The java.lang.String class has several constructors that form strings from byte arrays and several methods that return a byte array corresponding to a given string. Anytime a Unicode string is converted to bytes or vice versa, that conversion happens according to one of the encodings listed in Table 2.4. The same string can produce different byte arrays if different encodings are used. Six constructors form a new String object from a byte array:

public String(byte[] ascii, int highByte)
public String(byte[] ascii, int highByte, int offset, int length)
public String(byte[] data, String encoding) 
  throws UnsupportedEncodingException
public String(byte[] data, int offset, int length, String encoding) 
  throws UnsupportedEncodingException
public String(byte[] data)
public String(byte[] data, int offset, int length)

The first two constructors, the ones with the highByte argument, are leftovers from Java 1.0 that are deprecated in Java 1.1. These two constructors do not accurately translate non-Latin-1 character sets into Unicode. Instead, they read each byte in the ascii array as the low-order byte of a two-byte character, then fill in the high-order byte with the highByte argument. For example:

byte[] isoLatin1 = new byte[256];
for (int i = 0; i < 256; i++) isoLatin1[i] = (byte) i;
String s = new String(isoLatin1, 0);

Frankly, this is a kludge; it’s deprecated for good reason. This scheme works quite well for Latin-1 data with a high byte of ...

Get Java I/O now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.