Chapter 1. Introducing I/O

Input and output, I/O for short, are fundamental to any computer operating system or programming language. Only theorists find it interesting to write programs that don’t require input or produce output. At the same time, I/O hardly qualifies as one of the more “thrilling” topics in computer science. It’s something in the background, something you use every day—but for most developers, it’s not a topic with much sex appeal.

There are plenty of reasons for Java programmers to find I/O interesting. Java includes a particularly rich set of I/O classes in the core API, mostly in the java.io package. For the most part I/O in Java is divided into two types: byte- and number-oriented I/O, which is handled by input and output streams; and character and text I/O, which is handled by readers and writers. Both types provide an abstraction for external data sources and targets that allows you to read from and write to them, regardless of the exact type of the source. You use the same methods to read from a file that you do to read from the console or from a network connection.

But that’s just the tip of the iceberg. Once you’ve defined abstractions that let you read or write without caring where your data is coming from or where it’s going to, you can do a lot of very powerful things. You can define I/O streams that automatically compress, encrypt, and filter from one data format to another, and more. Once you have these tools, programs can send encrypted data or write zip files with almost no knowledge of what they’re doing; cryptography or compression can be isolated in a few lines of code that say, “Oh yes, make this an encrypted output stream.”

In this book, I’ll take a thorough look at all parts of Java’s I/O facilities. This includes all the different kinds of streams you can use. We’re also going to investigate Java’s support for Unicode (the standard multilingual character set). We’ll look at Java’s powerful facilities for formatting I/O—oddly enough, not part of the java.io package proper. (We’ll see the reasons for this design decision later.) Finally, we’ll take a brief look at the Java Communications API (javax.comm), which provides the ability to do low-level I/O through a computer’s serial and parallel ports.

I won’t go so far as to say, “If you’ve always found I/O boring, this is the book for you!” I will say that if you do find I/O uninteresting, you probably don’t know as much about it as you should. I/O is the means for communication between software and the outside world (including both humans and other machines). Java provides a powerful and flexible set of tools for doing this crucial part of the job.

Having said that, let’s start with the basics.

What Is a Stream?

A stream is an ordered sequence of bytes of undetermined length. Input streams move bytes of data into a Java program from some generally external source. Output streams move bytes of data from Java to some generally external target. (In special cases streams can also move bytes from one part of a Java program to another.)

The word stream is derived from an analogy with a stream of water. An input stream is like a siphon that sucks up water; an output stream is like a hose that sprays out water. Siphons can be connected to hoses to move water from one place to another. Sometimes a siphon may run out of water if it’s drawing from a finite source like a bucket. On the other hand, if the siphon is drawing water from a river, it may well provide water indefinitely. So too an input stream may read from a finite source of bytes like a file or an unlimited source of bytes like System.in. Similarly an output stream may have a definite number of bytes to output or an indefinite number of bytes.

Input to a Java program can come from many sources. Output can go to many different kinds of destinations. The power of the stream metaphor and in turn the stream classes is that the differences between these sources and destinations are abstracted away. All input and output are simply treated as streams.

Where Do Streams Come From?

The first source of input most programmers encounter is System.in . This is the same thing as stdin in C, generally some sort of console window, probably the one in which the Java program was launched. If input is redirected so the program reads from a file, then System.in is changed as well. For instance, on Unix, the following command redirects stdin so that when the MessageServer program reads from System.in, the actual data comes from the file data.txt instead of the console:

% java MessageServer < data.txt

The console is also available for output through the static field out in the java.lang.System class, that is, System.out . This is equivalent to stdout in C parlance and may be redirected in a similar fashion. Finally, stderr is available as System.err . This is most commonly used for debugging and printing error messages from inside catch clauses. For example:

try {
  //... do something that might throw an exception
}
catch (Exception e) { System.err.println(e); }

Both System.out and System.err are print streams, that is, instances of java.io.PrintStream.

Files are another common source of input and destination for output. File input streams provide a stream of data that starts with the first byte in a file and finishes with the last byte in the file. File output streams write data into a file, either by erasing the file’s contents and starting from the beginning or by appending data to the file. These will be introduced in Chapter 4.

Network connections provide streams too. When you connect to a web server or FTP server or something else, you read the data it sends from an input stream connected from that server and write data onto an output stream connected to that server. These streams will be introduced in Chapter 5.

Java programs themselves produce streams. Byte array input streams, byte array output streams, piped input streams, and piped output streams all use the stream metaphor to move data from one part of a Java program to another. Most of these are introduced in Chapter 8.

Perhaps a little surprisingly, AWT (and Swing) components like TextArea do not produce streams. The issue here is ordering. Given a group of bytes provided as data, there must be a fixed order to those bytes for them to be read or written as a stream. However, a user can change the contents of a text area or a text field at any point, not just the end. Furthermore, they can delete text from the middle of a stream while a different thread is reading that data. Hence, streams aren’t a good metaphor for reading data from graphical user interface (GUI) components. You can, however, always use the strings they do produce to create a byte array input stream or a string reader.

The Stream Classes

Most of the classes that work directly with streams are part of the java.io package. The two main classes are java.io.InputStream and java.io.OutputStream . These are abstract base classes for many different subclasses with more specialized abilities, including:

BufferedInputStream

BufferedOutputStream

ByteArrayInputStream

ByteArrayOutputStream

DataInputStream

DataOutputStream

FileInputStream

FileOutputStream

FilterInputStream

FilterOutputStream

LineNumberInputStream

ObjectInputStream

ObjectOutputStream

PipedInputStream

PipedOutputStream

PrintStream

PushbackInputStream

SequenceInputStream

StringBufferInputStream

 

Though I’ve included them here for completeness, the LineNumberInputStream and StringBufferInputStream classes are deprecated. They’ve been replaced by the LineNumberReader and StringReader classes, respectively.

Sun would also like to deprecate PrintStream. In fact, the PrintStream() constructors were deprecated in Java 1.1, though undeprecated in Java 2. Part of the problem is that System.out is a PrintStream ; therefore, PrintStream is too deeply ingrained in existing Java code to deprecate and is thus likely to remain with us for the foreseeable future.

The java.util.zip package contains four input stream classes that read data in a compressed format and return it in uncompressed format and four output stream classes that read data in uncompressed format and write in compressed format. These will be discussed in Chapter 9.

CheckedInputStream

CheckedOutputStream

DeflaterOutputStream

GZIPInputStream

GZIPOutputStream

InflaterInputStream

ZipInputStream

ZipOutputStream

The java.util.jar package includes two stream classes for reading files from JAR archives. These will also be discussed in Chapter 9.

JarInputStream

JarOutputStream

The java.security package includes a couple of stream classes used for calculating message digests:

DigestInputStream

DigestOutputStream

The Java Cryptography Extension (JCE) adds two classes for encryption and decryption:

CipherInputStream

CipherOutputStream

These four streams will be discussed in Chapter 10.

Finally, there are a few random stream classes hiding inside the sun packages—for example, sun.net.TelnetInputStream and sun.net.TelnetOutputStream . However, these are deliberately hidden from you and are generally presented as instances of java.io.InputStream or java.io.OutputStream only.

Get Java I/O now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.