Input streams read bytes and output streams write bytes. Readers read characters and writers write characters. Therefore, to understand input and output, you first need a solid understanding of how Java deals with bytes, integers, characters, and other primitive data types, and when and why one is converted into another. In many cases Java’s behavior is not obvious.
The fundamental integer
data type in Java is the
int
, a four-byte, big-endian, two’s
complement integer. An int
can take on all values
between -2,147,483,648 and 2,147,483,647. When you type a literal
integer like 7, -8345, or 3000000000 in Java source code, the
compiler treats that literal as an int
. In the
case of 3000000000 or similar numbers too large to fit in an
int
, the compiler emits an error message citing
“Numeric overflow.”
longs
are eight-byte, big-endian, two’s complement integers with
ranges from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
long
literals are indicated by suffixing the
number with a lower- or uppercase L. An
uppercase L is preferred because the lowercase
l is too easily confused with the numeral 1 in
most fonts. For example, 7L, -8345L, and 3000000000L are all 64-bit
long
literals.
There are two more integer data types available in Java, the
short
and the
byte
.
shorts
are two-byte, big-endian, two’s
complement integers with ranges from -32,768 to 32,767. They’re
rarely used in Java and are included mainly for compatibility with C.
bytes
, however, are very much used in Java. In
particular they’re used in I/O. A byte
is an
eight-bit, two’s complement integer that ranges from -128 to
127. Note that like all numeric data types in Java, a
byte
is signed. The maximum
byte
value is 127. 128, 129, and so on through 255
are not legal values for bytes.
There are no short
or byte
literals in Java. When you write the literal 42 or 24000, the
compiler always reads it as an int
, never as a
byte
or a short
, even when used
in the right-hand side of an assignment statement to a
byte
or short
, like this:
byte b = 42; short s = 24000;
However, in these lines a special assignment conversion
is performed by the compiler,
effectively casting the int
literals to the
narrower types. Because the int
literals are
constants known at compile time, this is permitted. However,
assignments from int
variables to
short
s and byte
s are not, at
least not without an explicit cast. For example, consider these
lines:
int i = 42; short s = i; byte b = i;
Compiling these lines produces the following errors:
Error: Incompatible type for declaration. Explicit cast needed to convert int to short. ByteTest.java line 6 Error: Incompatible type for declaration. Explicit cast needed to convert int to byte. ByteTest.java line 7
Note that this occurs even though the compiler is theoretically capable of determining that the assignment does not lose information. To correct this, you must use explicit casts, like this:
int i = 42; short s = (short) i; byte b = (byte) i;
Even simple arithmetic with small, byte
-valued
constants as follows produces “Explicit cast needed to convert
int to byte” errors:
byte b = 1 + 2;
In fact, even the addition of two byte
variables
produces an integer result and thus cannot be assigned to a
byte
variable without a cast; the following code
produces that same error:
byte b1 = 22; byte b2 = 23; byte b3 = b1 + b2;
For these reasons, working directly with byte
variables is inconvenient at best. Many of the methods in the stream
classes are documented as reading or writing
bytes
. However, what they really return or accept
as arguments are int
s in the range of an unsigned
byte (0-255). This does not match any Java primitive data type. These
int
s are then converted into
bytes
internally.
For instance, according to the javadoc
class
library documentation, the read()
method of
java.io.InputStream
returns “the next byte
of data, or -1 if the end of the stream is reached.” On a
little thought, this sounds suspicious. How is a -1 that appears as
part of the stream data to be distinguished from a -1 indicating end
of stream? In point of fact, the read()
method
does not return a byte
; its signature indicates
that it returns an int
:
public abstract int read() throws IOException
This int
is not a Java byte
with a value between -128 and 127 but a more general unsigned byte
with a value between
and 255. Hence, -1 can easily be distinguished from valid data values
read from the stream.
The write()
method in the
java.io.OutputStream
class is similarly
problematic. It returns void
, but takes an
int
as an argument:
public abstract void write(int b) throws IOException
This int
is intended to be an unsigned byte value
between
and 255. However, there’s nothing to stop a careless programmer
from passing in an int
value outside that range.
In this case, the eight low-order bits are written and the top 24
high-order bits are ignored. This is the effect of taking the
remainder modulo 256 of the int
b
and adding 256 if the value is negative; that
is,
b = b % 256 >= 0 ? b % 256 : 256 + b % 256;
More simply, using bitwise operators:
b = b & 0x000000FF;
Note
Although this is the behavior specified by the
Java Language Specification, since the
write()
method is abstract, actual implementation
of this scheme is left to the subclasses, and a careless programmer
could do something different.
On the other hand, real Java byte
s are used in
those methods that read or write arrays of bytes. For example,
consider these two read()
methods from
java.io.InputStream
:
public int read(byte[] data) throws IOException public int read(byte[] data, int offset, int length) throws IOException
While the difference between an 8-bit byte
and a
32-bit int
is insignificant for a single number,
it can be very significant when several thousand to several million
numbers are read. In fact, a single byte
still
takes up four bytes of space inside the Java virtual machine, but a
byte
array only occupies the amount of space it
actually needs. The virtual machine includes special instructions for
operating on byte
arrays, but does not include any
instructions for operating on single byte
s.
They’re just promoted to int
s.
Although data is stored in the array as signed Java bytes with values between -128 to 127, there’s a simple one-to-one correspondence between these signed values and the unsigned bytes normally used in I/O, given by the following formula:
int unsignedByte = signedByte >= 0 ? signedByte : 256 + signedByte;
Since byte
s have such a small range, they’re
often converted to int
s in calculations and method
invocations. Often they need to be converted back, generally through
a cast. Therefore, it’s useful to have a good grasp of exactly
how the
conversion
occurs.
Casting from an int
to a
byte
—for that matter,
casting from any wider
integer type to a narrower type—takes place through truncation
of the high-order bytes. This means that as long as the value of the
wider type can be expressed in the narrower type, the value is not
changed. The int
127 cast to a
byte
still retains the value 127.
On the other hand, if the int
value is too large
for a byte
, strange things happen. The
int
128 cast to a byte
is not
127, the nearest byte value. Instead, it is -128. This occurs through
the wonders of two’s complement arithmetic. Written in
hexadecimal, 128 is 0x00000080. When that int
is
cast to a byte
, the leading zeros are truncated,
leaving 0x80. In binary this can be written as 10000000. If this were
an unsigned number, 10000000 would be 128 and all would be fine, but
this isn’t an unsigned number. Instead, the leading bit is a
sign bit, and that 1 does not indicate 27
but a minus sign. The absolute value of a negative number is found by
taking the complement (changing all the 1 bits to
bits and vice versa) and adding 1. The complement of 10000000 is
01111111. Adding 1, you have 01111111 + 1 = 10000000 = 128 (decimal).
Therefore, the byte
0x80 actually represents -128.
Similar calculations show that the int
129 is cast
to the byte
-127, the int
130
is cast to the byte
-126, the
int
131 is cast to the byte
-125, and so on. This continues through the int
255, which is cast to the byte
-1.
When 256 is reached, the low-order bytes of the
int
are now filled with zeros. In other words, 256
is 0x00000100. Thus casting it to a byte produces 0, and the cycle
starts over. This behavior can be reproduced algorithmically with
this formula, though a cast is obviously simpler:
int byteValue; int temp = intValue % 256; if ( intValue < 0) { byteValue = temp < -128 ? 256 + temp : temp; } else { byteValue = temp > 127 ? temp - 256 : temp; }
Get Java I/O now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.