Chapter 4. Packages

One of the least appreciated good parts of Java can be found at the beginning of every Java file. This feature is so ubiquitous that most experienced Java programmers don’t even notice it, much less take the time to think about what it does for them and how to use it correctly. I am speaking of the package feature of Java, along with the notion of importing from some other package. This feature, and the protection mechanisms that are part of it, is one of the simple tools that allow large-scale software to be developed in the language.

It is easy to ignore the packaging system, to use it carelessly, or to simply not give it much thought until it causes you trouble. But rightly understood, the package mechanism allows the design of the overall software system to be disentangled and made independent in a number of important ways. If your package declarations don’t reflect your design decisions, you aren’t using the system correctly (although you will hardly be singled out for that sin). If they do, you have introduced another mechanism to allow understanding and isolation to your system, making it easier to comprehend, develop, and maintain.

The Basics

The first bit of information that the compiler sees in any Java source file is the package declaration. There can be lots of commentary prior to this declaration (for example, it is common practice to put any licensing or copyright information in a comment at the very beginning of a source file), but the compiler discards all of that. The first thing that the compiler needs to know, and probably the first thing a programmer should think about, is the package in which the contents of the source file reside.

A package declaration is simply the keyword package, followed by some name, followed by a semicolon. The name of the package may be a dot-separated list of simple names, in which case a package hierarchy is being identified. For example, if the first noncomment, nonblank line in a source file is:

package com.sun.foo.bar;

you know that the code in the file is in the bar package, which is part of the foo package, which is part of the sun package, which is part of the com package.

The basic function of a package is to create a namespace. All of the names that are externally visible in a source file are scoped to that package. A name that occurs in one package can appear in a different package, and the two names will be seen as distinct. This by itself gives an independence of structure to what you are doing; as long as you are working in a different package (and namespace) than the one I am working in, we don’t have to coordinate the names that we use if our work is going to be used together.

Because having separate namespaces is important, it also is important that the names of the packages themselves be different. Java is a name-equivalence language and environment, which means that things with the same name are the same. Thus, if I have two classes with exactly the same name but very different implementations, Java won’t tell them apart; which one actually gets used will depend on the order in which the class files are encountered. Once some class with a name is loaded, any class with that same name will use the loaded implementation.[12] Making sure that your package names are unique lets you avoid unintentionally introducing a name that is the same as one introduced by someone else.

Of course, generating a unique name is not as easy as it sounds, especially if you are building some code that might be used by others at some time in the future and at some as-yet unknown place. This is really an instance of the general problem of generating unique identifiers in a distributed system (with you and all of the other programmers who use Java as nodes in the distributed system), made somewhat more complex by the lack of good connectivity between the nodes (programmers) in this particular system and the fact that the nodes are all programmed to behave in various nondeterministic ways (or, if deterministic, using an algorithm that is currently unknown). All of which is geek-speak for saying that you never know what people are going to do, and there is no way to find out until it is too late.

To help ensure that package names are in fact unique, a convention was started early in the history of Java development that a package name should start with the domain name (in reverse order) of the organization from which the package originates. Thus, packages that originate from Sun Microsystems, Inc. start with the prefix com.sun, those from Hewlett Packard, com.hp, and those from Harvard University, edu.harvard. It is up to the organization to figure out how to generate unique names, but this is a more local problem (although for these organizations, no less complex). Finally, there are some package names that are reserved for the system itself; these include the java and javax names.

Of course, there are times when I’m not writing code for any particular organization but just hacking on my own. I could, in those cases, use the same prefix as I would if I were writing for an organization. Or I could (and do) use some other string of names that make me relatively sure that the package will be unique. Email names work just fine, or personal domains. It is far more common for individuals to have network-unique names, so there is always a starting prefix that you can use for your package names.

Every source file in Java declares a package, even those that try not to. If you don’t have a package declaration at the beginning of a file, the contents of that file are placed in the default unnamed package. The unnamed package is a form of namespace limbo, where code written by confused, obstinate, or lazy programmers is placed until they evolve to a higher life form. There is no good reason to place anything that you do in the unnamed package, so just say no.

Items within a single package can refer to each other by name. From another package, one way you can refer to an entity is by using the fully qualified name of the entity, which is the package name of the package the entity is in, followed by a “.”, followed by the name of the entity. So if my com.sun.foo.bar package contains a class by the name of baz, I can refer to that class from outside the package using the name com.sun.foo.bar.baz. This is fine if I only want to refer to the class baz once or twice, but if I want to refer to class baz more often, this gets cumbersome.

The alternative is to import the name of baz into my current namespace. This is done, not surprisingly, by using the import statement. An import statement takes the form of the keyword import, followed by the fully qualified name that you want to import into the current namespace, followed by a semicolon. Importing the name in such a way makes it visible in the namespace of the source file, and also tells the compiler that it needs to refer (and perhaps compile) the package from which the import comes. So if I want to refer to baz a number of times in my source, I can include the statement:

import com.sun.foo.bar.baz;

and then I can simply use the name baz in my code. I could also import the entire namespace by having the statement:

import com.sun.foo.bar.*;

This would make all of the names defined in the namespace com.sun.foo.bar visible to this code. This more general form of the import statement is quite popular, but should be avoided if you can. By importing more than you need, you are polluting the namespace of your own code, and making it more likely that you will clash with some name that is defined in the other package. You are also introducing extra dependencies into your code. Most of the time this won’t make any real difference, but when it does, it leads to problems that are hard to identify and fix. Your code is also easier for others to read if you avoid importing entire namespaces, since the reader can go to the top of a file and use the import statements to find our what package contains the imported names for the file. This advantage is somewhat minimized by the navigation functions in modern interactive development environments, but you can’t yet assume that all of your colleagues use such an environment. Be nice to those who don’t; they suffer enough as it is.

As a general rule, you are better off importing only those parts of another package that you really need (and that you actually refer to) rather than the whole package. Modern IDEs make this pretty straightforward, as they will import names automatically when they are used. If you really do need to import all of the names defined in another package, you might want to think about the design of your system, as you have two distinct packages that are so intertwined that you should probably only have one (or you have missed the real line that should separate the packages).

On rare occasions you will find yourself unable to import a name into a namespace. This is when that name already occurs within that namespace. Importing in such a circumstance would lead to an ambiguous name, so the name that occurs within the namespace wins. If you really need to refer to something from a different package that has the same name as something in the current package, you need to refer to the external entity using the full name (that is, with the package name as a prefix). Sometimes this can’t be avoided, especially when you are using code written by someone else and he has picked all the good names.

What can be avoided is doing this to yourself. In our example, we have named our implementation of the Batter interface the BatterImpl class. But we could simply name our implementation class the Batter class, and say (in our source):

public class Batter 
    implements com.oreilly.javaGoodParts.examples.statistics.Batter{
  ...
}

This would technically work, in the sense that it would compile. But it would also be very confusing to anyone trying to read or learn the code. It will even be confusing to you at some time in the future when you try to maintain or extend the code you originally wrote. Having separate namespaces lets you do this, but just because you can do it doesn’t mean that you should.

Packages and Access Control

The package system in Java does more than just establish separate namespaces for the various parts of a system. The package system also plays into the access scheme used in the language.

Like most object-oriented languages, Java allows the programmer to declare who can access what parts (if any) of an object. As befits its C++ heritage, Java allows fields and methods to be accessed by any part of a class in which those fields and methods are declared. But outside of an object, access is defined by the access control modifier associated with the field or method. Fields or methods that are labeled private can be accessed only from within the defining class. Those that are labeled protected can be accessed either by other parts of the class or by any class that extends the defining class. Finally, those that are marked public may be accessed by anyone. This much is familiar to those who come to Java from C++.

But Java has an additional access category connected to the package system. Unless a field or method is marked as private, that field or method is also accessible to anything that resides in the same package as the field or method. This is no surprise for fields or methods that have been labeled as public, since anyone from any package can access such fields. But it is somewhat more surprising for those fields or methods that are marked as protected, since it allows access from methods that are in the package but have no relationship to the defining class through the type hierarchy. In fact, this introduces a fourth form of access specification, which is marked by there being no access specification at all. If a field or method is not labeled as being private, protected, or public, that field or method is said to have package access, which means that it is accessible by anything in the package, but not by anything else.

This gives a hierarchy of access possibilities for the programmer. At the most restricted are those fields and methods marked private, which can be accessed only from within the class in which they occur. Next most restrictive are those with no declared access specification. These have package access, which makes them available to anything that is in the same package, but keeps them from the prying eyes of anything in any other package. The next level of access, protected, loosens the restrictions on package protection to include any classes that are extensions of the class in which the method or field are defined, no matter where in the set of packages those extensions are defined. Finally, there are those methods and fields that are marked as public, which can be accessed from anywhere.

It is important to keep in mind one difference between the hierarchy formed by packages and the hierarchy formed by classes. The hierarchy formed by classes is inclusive; that is, an object that is an instance of a class is an instance of any class that is in the hierarchy above that class. This means that the protected access specifier opens the access to the field or method to any part of any class that is a continuation of the class hierarchy rooted in the class in which the method or field is declared. It is best to think of the protected access specifier as giving access to any object that is at least of the type in which the field or method is declared.

Packages, like classes, form a hierarchy that is a tree. But membership in a package is not polymorphic; that is, something that is defined in the package foo.bar.baz is not part of the foo or the foo.bar package. Fields and methods that have package protection can be accessed only by things that are in exactly the same package. If there is a field or method that is package protected in the package foo.bar, don’t expect to be able to access it in the package foo.bar.baz.

Classes and interfaces are also subject to access specifications. However, with the exception of inner classes, the possible access specifications for these parts of the language are limited to either package access (in which case the class or interface are not labeled) or public access. A little thought convinces one that these are the only access specifiers that make sense. A class or interface that can be called only by itself is not very interesting. Neither is one that can only be called by subclasses. Although from a purely linguistic point of view, this lack of symmetry may be troubling, the fact that the language keeps you from doing something useless more than makes up for it.

You can also put some access specifications on the methods defined in an interface, but what you do here doesn’t really matter. Only two access specifiers are legal for interface methods. You can mark an interface method as abstract, but doing so has no effect, since all interface methods are unimplemented at the level of the interface and are instantiated only in classes that implement the interface. Likewise, you can mark an interface method as public, but this is documentation at best; an interface method is accessible to any code that can access the interface. If the interface is marked as public, then all of the methods of that interface are public, even if that access specifier does not preface the method declaration. If the interface has only package visibility, then the methods in that interface will also have only package visibility.

An Example

Let’s go back to our baseball statistics system and apply some good practice with regard to packages. We at least have a package declaration on all of our files (they are in the package examples), but it’s hard to argue that the particular package name we chose is going to be globally unique. So we should probably start by picking a prefix for the package that will give us a higher confidence that the namespace for our system is unique. When I’m doing this at work, I prefix my packages with com.sun, which then only requires having a unique package among those developed within one company (the prefix, needless to say, is a lot longer; the company namespace is only the beginning). But I’m doing this work as part of a book, so I will use the prefix com.oreilly.javaGoodParts.examples.

This will ensure that the names for classes and interfaces developed within the package structure are unique (unless someone else is writing a book with the same title for the same publisher). But we are going to go further than that and start breaking apart the structure of the baseball statistics package, so that we can cluster parts that need to interact within the same package and isolate those that don’t need to interact in separate packages.

The first separation we can do is between the interfaces that define the external face of the system and the implementations of those interfaces. We can place all of the interface definitions (currently, those in the files Batter.java, Catcher.java, and Fielder.java) in the package:

com.oreilly.javaGoodParts.examples.statistics

by starting each of these files with the line:

package com.oreilly.javaGoodParts.examples.statistics;

The implementation classes will be placed in another package. For the moment, we will only have a single implementation package:

com.oreilly.javaGoodParts.examples.impl

although that might change as the implementation gets more complex. The reason for this split is to allow clients of the basic statistics storing classes to be dependent only on the interfaces that define those classes, not on the implementation. By placing the interfaces in a separate package, we can have multiple implementations (all in their own package or packages) and the client will never be directly tied to any of them.

Splitting our code into separate packages is a form of refactoring that can ripple through our code. Now that the interfaces are in a separate package, the classes that refer to those interfaces (which are all of them) need to import the interfaces, since they are no longer in the same namespace. So along with a different package declaration, all of the implementation files will need to be changed to include an import statement. For example, our BatterImpl class now needs to be able to see the Batter interface, so we need to include the line:

import com.oreilly.javaGoodParts.examples.statistics.Batter;

for the class to compile. We could, of course, have included everything in the statistics package by replacing the import with:

import com.oreilly.javaGoodParts.examples.statistics.*;

But that would have included more than what is used in the BatterImpl class. I find it good practice to include only those parts of a package that are necessary. If nothing else, too long a list of imports from another package shows an interconnection between the package being imported and the package doing the importing that might indicate a design flaw. If you have to import too much from a different package, you have a lot of dependencies between them, and your package abstractions may not be correct.

The number of places that may have to be changed can get out of hand rather rapidly, especially if you are doing this kind of refactoring over a large code base. Fortunately, most modern interactive development environments (in particular, both Eclipse and Netbeans) have very good facilities that automate all or nearly all of the changes required for this kind of refactoring. This is one of the places where a good IDE really shines, although traditionalists will also be able to accomplish the same sort of thing with scripts.

An interesting question in this refactoring is where to put our exception class, NotEnoughAtBatsException. This is a class, as are all exceptions, and so would generally be part of the package that contains implementations. This would argue for placing it in the com.oreilly.javaGoodParts.examples.impl package, but the definition of the Batter interface in the com.oreilly.javaGoodParts.examples.statistics package refers to this exception. So our choice is either to import the exception from the implementation package into the interface package or to include a particular implementation in our set of interfaces.

Neither of these choices is particularly clean. The purpose of defining a set of interfaces is to allow those interfaces to be independent of the implementation classes. Importing an exception class from an implementation package explicitly ties the interface to at least part of a particular implementation. But including the exception in the interface package means that the particular implementation of the exception is part of the abstract definition of the set of interfaces, which I have argued in Chapter 2 is a bad idea.

The real cause of this problem is that exceptions in Java cannot be defined as interfaces and can be defined only as extensions of the Exception class. Since we want to declare exceptions as part of the signature of methods that are first defined in interfaces, there is no way of avoiding mixing these classes with our (more abstract) interface definitions. As language problems go, this one is actually pretty benign. Exceptions tend to be fairly simple and often carry information that is going to be needed by any implementation’s exception handlers. Where you place them is more a matter of personal taste than design dictates (actually, most design dictates boil down to personal taste, which doesn’t make them any less correct, but that’s the subject of a different book). I prefer putting the exceptions thrown by methods defined in an interface in the same package as the interfaces, and acknowledge that (in this one case) there are implementation details that leak into the interface definitions. This means that I would be importing two items from the interface namespace when I implement the BatterImpl class, so the beginning of that class would look something like:

package com.oreilly.javaGoodParts.examples.impl;

import com.oreilly.javaGoodParts.examples.statistics.NotEnoughAtBatsException;
import com.oreilly.javaGoodParts.examples.statistics.Batter;

public class BatterImpl implements Batter {
   ...

You might choose to do things differently, and I would understand. But making use of the package system to give yourself a way of grouping interacting components of your system is a good thing about the language, so you shouldn’t use the fact that exceptions keep it from being pure and perfect as a reason not to use it as part of your design. Nor should you use it as an excuse not to think about exceptions, a subject I discussed in Chapter 3.

Packages and the Filesystem

While the inability to have some packages that are implementation-independent is regrettable, the required interaction between the package system and the filesystem is both regrettable and a pain. Simply put, the interaction is that whenever you declare a component in a package name, you need to have a corresponding directory in your filesystem that corresponds to that component. This is where the compiler will look for the source files that are defined in that component, and this is where the classloaders will look for the object files that contain the binaries for the classes in those packages. Most of us who use Java have become so used to this that we don’t even think about it. But it is strange and inconvenient, and has built up enough supporting cruft that it is often confusing (and the source of interesting problems, which we will see later). So I will end this chapter with some reflections on this oddity.

Just to remind you, if you have a class that begins with the line:

package com.oreilly.javaGoodParts.examples.impl;

then the source for that class will need to be in a directory (from wherever you start) with the name (on an adult operating system):

com/oreilly/javaGoodParts/examples/impl

that is, in the impl directory placed in the examples directory placed in the javaGoodParts directory placed in the oreilly directory placed in the com directory. At the lower level of the package naming hierarchy, this doesn’t seem all that unnatural. You have different directory locations for the files that implement a related part of the system, and those parts reflect the package structure. Even at some of the higher levels this seems to make some sense; as a peer to our examples, we will have the tests for those examples, clustered in their own namespace and in their own directory.

At the next level, things get a little less sensible. I may be doing multiple books for O’Reilly, but this doesn’t seem to be something that should be reflected in the filesystem on my computer. Sure, I’ll keep the different books in different directories, but I might not want them to share a parent directory. And at the highest level, this just seems odd. I may be doing work for organizations that appear in different DNS domains on the Internet, but this hardly seems like a reasonable way to organize my local files.

This is where we get to see the wisdom of the adage “History clarifies stupidity.”[13] What is meant by this adage is that if you understand the history of some set of decisions, you can often see why they seemed like a good idea at the time. This is the best way to understand the tie between Java packages and the filesystem.

When Java was first being implemented, there were many projects within Sun (and, no doubt, elsewhere) that were trying to build a programming environment in which the source and binary files would be kept in a database. There were lots of reasons why this would have been a good idea: it would have helped with incorporating a version control system, releases could be done more consistently, and queries over the structures held in the database could help programmers understand the structure of the system. The Java packaging system would have fit into such an environment beautifully. The unique names would act as primary keys, and the hierarchical nature of the names would map naturally into all kinds of database structures.

But those environments weren’t quite ready at the time programmers were starting to use Java widely. In fact, at the time the most common programming environments were emacs (or, inside of Sun, vi) and command lines in terminal windows. So the decision was made to use the filesystem as a cheap emulator of a database, just until the integrated environment using a real database was ready.

Of course, the integrated development environment with database never appeared. So we still use the filesystem as a database surrogate, and our package names have to be reflected in those filesystems. Fortunately, the IDEs that have appeared take much of the work out of using the filesystem in that way, doing all of the extra directory creation and transitions for us. It is less of a pain than it once was, but can still lead to confusion and lots of extra directories in the source code structure.



[12] This is actually not precisely true, because of classloaders (see the discussion in Chapter 2). If we want to be completely precise, we would have to say that the two classes need to have the same name and have to be loaded by classloaders that are in the same classloader hierarchy. But this book is supposed to be about the good parts of Java, so I’m going to avoid talking about classloaders whenever possible.

[13] I first heard this from my manager at the time, Mark Hodapp. I find it useful to remind myself of this fairly often, even though the initial interpretation of the adage, in which it is taken to mean that stupidity is more clearly seen in the light of history, is not the one he meant.

Get Java: The Good Parts now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.