Converting Between Byte Arrays and Strings
The
java.lang.String
class has several
constructors that form strings
from byte arrays and several methods that return a
byte array corresponding to a given
string. Anytime a Unicode string is converted to bytes or vice versa,
that conversion happens according to one of the encodings listed in
Table 2.4. The same string can produce different
byte arrays if different encodings are used. Six constructors form a
new String
object from a byte array:
public String(byte[] ascii, int highByte) public String(byte[] ascii, int highByte, int offset, int length) public String(byte[] data, String encoding) throws UnsupportedEncodingException public String(byte[] data, int offset, int length, String encoding) throws UnsupportedEncodingException public String(byte[] data) public String(byte[] data, int offset, int length)
The first two constructors, the ones with the
highByte
argument, are leftovers from Java 1.0
that are deprecated in Java 1.1. These two constructors do not
accurately translate non-Latin-1 character sets into Unicode.
Instead, they read each byte in the ascii
array as
the low-order byte of a two-byte character, then fill in the
high-order byte with the highByte
argument. For
example:
byte[] isoLatin1 = new byte[256]; for (int i = 0; i < 256; i++) isoLatin1[i] = (byte) i; String s = new String(isoLatin1, 0);
Frankly, this is a kludge; it’s deprecated for good reason. This scheme works quite well for Latin-1 data with a high byte of ...
Get Java I/O now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.