Chapter 4. Write Code Once

Number one in the stink parade is duplicated code.

Kent Beck and Martin Fowler,
Bad Smells in Code

Guideline:

Do not copy code.
Do this by writing reusable, generic code and/or calling existing methods instead.
This improves maintainability because when code is copied, bugs need to be fixed at multiple places, which is inefficient and error-prone.

Copying existing code looks like a quick win—why write something anew when it already exists? The point is: copied code leads to duplicates, and duplicates are a problem. As the quote above indicates, some even say that duplicates are the biggest software quality problem of all.

Consider a system that manages bank accounts. In this system, money transfers between accounts are represented by objects of the Transfer class (not shown here). The bank offers checking accounts represented by class CheckingAccount:

public class CheckingAccount {
    private int transferLimit = 100;

    public Transfer makeTransfer(String counterAccount, Money amount)
        throws BusinessException {
        // 1. Check withdrawal limit:
        if (amount.greaterThan(this.transferLimit)) {
            throw new BusinessException("Limit exceeded!");
        }
        // 2. Assuming result is 9-digit bank account number, validate 11-test:
        int sum = 0;
        for (int i = 0; i < counterAccount.length(); i++) {
            sum = sum + (9-i) * Character.getNumericValue(
                counterAccount.charAt(i));
        }
        if (sum % 11 == 0) {
            // 3. Look up counter account and make transfer object:
            CheckingAccount acct = Accounts.findAcctByNumber(counterAccount);
            Transfer result = new Transfer(this, acct, amount);
            return result;
        } else {
            throw new BusinessException("Invalid account number!");
        }
    }
}

Given the account number of the account to transfer money to (as a string), the makeTransfer method creates a Transfer object. makeTransfer first checks whether the amount to be transferred does not exceed a certain limit. In this example, the limit is simply hardcoded. makeTransfer then checks whether the number of the account to transfer the money to complies with a checksum (see the sidebar “The 11-Check for Bank Account Numbers” for an explanation of the checksum used). If that is the case, the object that represents this account is retrieved, and a Transfer object is created and returned.

Now assume the bank introduces a new account type, called a savings account. A savings account does not have a transfer limit, but it does have a restriction: money can only be transferred to one particular (fixed) checking account. The idea is that the account owner chooses once to couple a particular checking account with a savings account.

A class is needed to represent this new account type. Suppose the existing class is simply copied, renamed, and adapted. This would be the result:

public class SavingsAccount {
    CheckingAccount registeredCounterAccount;

    public Transfer makeTransfer(String counterAccount, Money amount)
        throws BusinessException {
        // 1. Assuming result is 9-digit bank account number, validate 11-test:
        int sum = 0; 
        for (int i = 0; i < counterAccount.length(); i++) {
            sum = sum + (9 - i) * Character.getNumericValue(
                counterAccount.charAt(i));
        }
        if (sum % 11 == 0) {
            // 2. Look up counter account and make transfer object:
            CheckingAccount acct = Accounts.findAcctByNumber(counterAccount);
            Transfer result = new Transfer(this, acct, amount); 
            // 3. Check whether withdrawal is to registered counter account:
            if (result.getCounterAccount().equals(this.registeredCounterAccount))
            {
                return result;
            } else {
                throw new BusinessException("Counter-account not registered!");
            }
        } else {
            throw new BusinessException("Invalid account number!!");
        }
    }
}

: Start of code clone.
: End of code clone.

Both classes exist in the same codebase. By copying and pasting an existing class, we have introduced some duplicated code in the codebase. There are now two fragments (eight lines of code each) of consecutive lines of code that are exactly the same. These fragments are called code clones or duplicates.

Now suppose a bug is discovered in the implementation of the 11-test (the for loop that iterates over the characters in counterAccount). This bug now needs to be fixed in both duplicates. This is additional work, making maintenance less efficient. Moreover, if the fix is only made in one duplicate but the other is overlooked, the bug is only half fixed.

Important

Resist the temptation of gaining a short-term advantage by copying and pasting code. For every future adjustment to either duplicate, you will need to revisit all duplicates.

Coding is about finding generic solutions for specific problems. Either reuse (by calling) an existing, generic method in your codebase, or make an existing method more generic.

Types of Duplication

We define a duplicate or code clone as an identical piece of code at least six lines long. The line count excludes whitespace and comments, just like in the regular definition of “line of code” (see also Chapter 1). That means that the lines need to be exactly the same to be considered a duplicate. Such clones are called Type 1 clones. It does not matter where the duplicates occur. Two clones can be in the same method, in different methods in the same class, or in different methods in different classes in the same codebase. Code clones can cross method boundaries. For instance, if the following fragment appears twice in the codebase, it is considered one clone of six lines of code, not two clones of three lines each:

public void setGivenName(String givenName) {
    this.givenName = givenName;
}

public void setFamilyName(String familyName) {
    this.familyName = familyName;
}

The following two methods are not considered duplicates of each other even though they differ only in literals and the names of identifiers:

public void setPageWidthInInches(float newWidth) {
    float cmPerInch = 2.54;
    this.pageWidthInCm = newWidth * cmPerInch;
    // A few more lines.
}

public void setPageWidthInPoints(float newWidth) {
    float cmPerPoint = 0.0352777;
    this.pageWidthInCm = newWidth * cmPerPica;
    // A few more lines (same as in setPageWidthInInches).
}

Two fragments of code that are syntactically the same (as opposed to textually) are called Type 2 clones. Type 2 clones differ only in whitespace, comments, names of identifiers, and literals. Every Type 1 clone is always also a Type 2 clone, but some Type 2 clones are not Type 1 clones. The methods setPageWidthInInches and setPageWidthInPoints are Type 2 clones but not Type 1 clones.

The guideline presented in this chapter is about Type 1 clones, for two reasons:

Source code maintenance benefits most from the removal of Type 1 clones.
Type 1 clones are easier to detect and recognize (both by humans and computers, as detecting Type 2 clones requires full parsing).

The limit of six lines of code may appear somewhat arbitrary, since other books and tools use a different limit. In our experience, the limit of six lines is the right balance between identifying too many and too few clones. As an example, a toString method could be three or four lines, and those lines may occur in many domain objects. Those clones can be ignored, as they are not what we are looking for—namely, deliberate copies of functionality.

Motivation

To understand the advantages of a codebase with little duplication, in this section we discuss the effects that duplication has on system maintainability.

Duplicated Code Is Harder to Analyze

If you have a problem, you want to know how to fix it. And part of that “how” is where to locate the problem. When you are calling an existing method, you can easily find the source. When you are copying code, the source of the problem may exist elsewhere as well. However, the only way to find out is by using a clone detection tool. A well-known tool for clone detection is CPD, which is included in a source code analysis tool called PMD. CPD can be run from inside Eclipse as well as from Maven.

Important

The fundamental problem of duplication is not knowing whether there is another copy of the code that you are analyzing, how many copies exist, and where they are located.

Duplicated Code Is Harder to Modify

All code may contain bugs. But if duplicated code contains a bug, the same bug appears multiple times. Therefore, duplicated code is harder to modify; you may need to repeat bug fixes multiple times. This, in turn, requires knowing that a fix has to be made in a duplicate in the first place! This is why duplication is a typical source of so-called regression bugs: functionality that has worked normally before suddenly stops working (because a duplicate was overlooked).

The same problem holds for regular changes. When code is duplicated, changes may need to be made in multiple places, and having many duplicates makes changing a codebase unpredictable.

How to Apply the Guideline

To avoid the problem of duplicated bugs, never reuse code by copying and pasting existing code fragments. Instead, put it in a method if it is not already in one, so that you can call it the second time that you need it. That is why, as we have covered in the previous chapters, the Extract Method refactoring technique is the workhorse that solves many duplication problems.

In the example presented at the beginning of the chapter, the code that implements the checksum (which is part of the duplicate) is an obvious candidate for extraction. To resolve duplication using Extract Method, the duplicate (or a part thereof) is extracted into a new method which is then called multiple times, once from each duplicate.

In Chapter 2, the new extracted method became a private method of the class in which the long method occurs. That does not work if duplication occurs across classes, as in CheckingAccount and SavingsAccount. One option in that case is to make the extracted method a method of a utility class. In the example, we already have an appropriate class for that (Accounts). So the new static method, isValid, is simply a method of that class:

public static boolean isValid(String number) {
    int sum = 0;
    for (int i = 0; i < number.length(); i++) {
        sum = sum + (9 - i) * Character.getNumericValue(number.charAt(i));
    }
    return sum % 11 == 0;
}

This method is called in CheckingAccount:

public class CheckingAccount {
    private int transferLimit = 100;

    public Transfer makeTransfer(String counterAccount, Money amount)
        throws BusinessException {
        // 1. Check withdrawal limit:
        if (amount.greaterThan(this.transferLimit)) {
            throw new BusinessException("Limit exceeded!");
        }
        if (Accounts.isValid(counterAccount)) { 
            // 2. Look up counter account and make transfer object:
            CheckingAccount acct = Accounts.findAcctByNumber(counterAccount);
            Transfer result = new Transfer(this, acct, amount); 
            return result;
        } else {
            throw new BusinessException("Invalid account number!");
        }
    }

}

: Start of short clone (three lines of code).
: End of short clone (three lines of code).

And also in SavingsAccount:

public class SavingsAccount {
    CheckingAccount registeredCounterAccount;

    public Transfer makeTransfer(String counterAccount, Money amount)
        throws BusinessException {
        // 1. Assuming result is 9-digit bank account number,
        // validate with 11-test:
        if (Accounts.isValid(counterAccount)) { 
            // 2. Look up counter account and make transfer object:
            CheckingAccount acct = Accounts.findAcctByNumber(counterAccount);
            Transfer result = new Transfer(this, acct, amount); 
            if (result.getCounterAccount().equals(this.registeredCounterAccount))
            {
                return result;
            } else {
                throw new BusinessException("Counter-account not registered!");
            }
        } else {
            throw new BusinessException("Invalid account number!!");
        }
    }

}

: Start of short clone (three lines of code).
: End of short clone (three lines of code).

Mission accomplished: according to the definition of duplication presented at the beginning of this chapter, the clone has disappeared (because the repeated fragment is fewer than six lines of code). But the following issues remain:

Even though according to the definition, the clone has disappeared, there is still logic repeated in the two classes.
The extracted fragment had to be put in a third class, just because in Java every method needs to be in a class. The class to which the extracted method was added runs the risk of becoming a hodgepodge of unrelated methods. This leads to a large class smell and tight coupling. Having a large class is a smell because it signals that there are multiple unrelated functionalities within the class. This tends to lead to tight coupling when methods need to know implementation details in order to interact with such a large class. (For elaboration, see Chapter 6.)

The refactoring technique presented in the next section solves these problems.

The Extract Superclass Refactoring Technique

In the preceding code snippets, there are separate classes for a checking account and a savings account. They are functionally related. However, they are not related in Java (they are just two classes that each derive directly from java.lang.Object). Both have common functionality (the checksum validation), which introduced a duplicate when we created SavingsAccount by copying and pasting (and modifying) CheckingAccount. One could say that a checking account is a special type of a (general) bank account, and that a savings account is also a special type of a (general) bank account. Java (and other object-oriented languages) has a feature to represent the relationship between something general and something specific: inheritance from a superclass to a subclass.

The Extract Superclass refactoring technique uses this feature by extracting a fragment of code lines not just to a method, but to a new class that is the superclass of the original class. So, to apply this technique, you create a new Account class like so:

public class Account {
    public Transfer makeTransfer(String counterAccount, Money amount)
        throws BusinessException {
        // 1. Assuming result is 9-digit bank account number, validate 11-test:
        int sum = 0; 
        for (int i = 0; i < counterAccount.length(); i++) {
            sum = sum + (9 - i) * Character.
                getNumericValue(counterAccount.charAt(i));
        }
        if (sum % 11 == 0) {
            // 2. Look up counter account and make transfer object:
            CheckingAccount acct = Accounts.findAcctByNumber(counterAccount);
            Transfer result = new Transfer(this, acct, amount); 
            return result;
        } else {
            throw new BusinessException("Invalid account number!");
        }
    }
}

: Start of extracted clone.
: End of extracted clone.

The new superclass, Account, contains logic shared by the two types of special accounts. You can now turn both the CheckingAccount and SavingsAccount classes into subclasses of this new superclass. For CheckingAccount, the result looks like this:

public class CheckingAccount extends Account {
    private int transferLimit = 100;

    @Override
    public Transfer makeTransfer(String counterAccount, Money amount)
        throws BusinessException {
        if (amount.greaterThan(this.transferLimit)) {
            throw new BusinessException("Limit exceeded!");
        }
        return super.makeTransfer(counterAccount, amount);
    }
}

The CheckingAccount class declares its own member, transferLimit, and overrides makeTransfer. The makeTransfer method first checks to be sure the amount to be transferred does not exceed the limit for checking accounts. If that is the case, it calls makeTransfer in the superclass to create the actual transfer.

The new version of SavingsAccount works likewise:

public class SavingsAccount extends Account {
    CheckingAccount registeredCounterAccount;

    @Override
    public Transfer makeTransfer(String counterAccount, Money amount)
        throws BusinessException {
        Transfer result = super.makeTransfer(counterAccount, amount);
        if (result.getCounterAccount().equals(this.registeredCounterAccount)) {
            return result;
        } else {
            throw new BusinessException("Counter-account not registered!");
        }
    }
}

The SavingsAccount class declares registeredCounterAccount and, just like CheckingAccount, overrides makeTransfer. The makeTransfer method does not need to check a limit (because savings accounts do not have a limit). Instead, it calls makeTransfer directly in the superclass to create a transfer. It then checks whether the transfer is actually with the registered counter account.

All functionality is now exactly where it belongs. The part of making a transfer that is the same for all accounts is in the Account class, while the parts that are specific to certain types of accounts are in their respective classes. All duplication has been removed.

As the comments indicate, the makeTransfer method in the Account superclass has two responsibilities. Although the duplication introduced by copying and pasting CheckingAccount has already been resolved, one more refactoring—extracting the 11-test to its own method—makes the new Account class even more maintainable:

public class Account {
    public Transfer makeTransfer(String counterAccount, Money amount)
        throws BusinessException {
        if (isValid(counterAccount)) {
            CheckingAccount acct = Accounts.findAcctByNumber(counterAccount);
            return new Transfer(this, acct, amount);
        } else {
            throw new BusinessException("Invalid account number!");
        }
    }

    public static boolean isValid(String number) {
        int sum = 0;
        for (int i = 0; i < number.length(); i++) {
            sum = sum + (9 - i) * Character.getNumericValue(number.charAt(i));
        }
        return sum % 11 == 0;
    }
}

The Account class is now a natural place for isValid, the extracted method.

Common Objections to Avoiding Code Duplication

This section discusses common objections regarding code duplication. From our experience, these are developers’ arguments for allowing duplication, such as copying from other codebases, claiming there are “unavoidable” cases, and insisting that some code will “never change.”

Copying from Another Codebase Should Be Allowed

“Copying and pasting code from another codebase is not a problem because it will not create a duplicate in the codebase of the current system.”

Technically, that is correct: it does not create a duplicate in the codebase of the current system. Copying code from another system may seem beneficial if the code solves the exact same problem in the exact same context. However, in any of the following situations you will run into problems:

The other (original) codebase is still maintained: Your copy will not benefit from the improvements made in the original codebase. Therefore, do not copy, but rather import the functionality needed (that is, add the other codebase to your classpath).
The other codebase is no longer maintained and you are working on rebuilding this codebase: In this case, you definitely should not copy the code. Often, rebuilds are caused by maintainability problems or technology renewals. In the case of maintainability issues, you would be defeating the purpose by copying code. You are introducing code that is determined to be (on average) hard to maintain. In the case of technology renewals, you would be introducing limitations of the old technology into the new codebase, such as an inability to use abstractions that are needed for reusing functionality efficiently.

Slight Variations, and Hence Duplication, Are Unavoidable

“Duplication is unavoidable in our case because we need slight variations of common functionality.”

Indeed, systems often contain slight variations of common functionality. For instance, some functionality is slightly different for different operating systems, for other versions (for reasons of backward compatibility), or for different customer groups. However, this does not imply that duplication is unavoidable. You need to find those parts of the code that are shared by all variants and move them to a common superclass, as in the examples presented in this chapter. You should strive to model variations in the code in such a way that they are explicit, isolated, and testable.

This Code Will Never Change

“This code will never, ever change, so there is no harm in duplicating it.”

If it is absolutely, completely certain that code will never, ever change, duplication (and every other aspect of maintainability) is not an issue. For a start, you have to be absolutely, completely certain that the code in question also does not contain any bugs that need fixing. Apart from that, the reality is that systems change for many reasons, each of which may eventually lead to changes in parts deemed to never, ever change:

The functional requirements of the system may change because of changing users, changing behavior, or a change in the way the organization does business.
The organization may change in terms of ownership, responsibilities, development approach, development process, or legislative requirements.
Technology may change, typically in the system’s environment, such as the operating system, libraries, frameworks, or interfaces to other applications.
Code itself may change, because of bugs, refactoring efforts, or even cosmetic improvements.

That is why we argue that most of the time the expectation that code never changes is unfounded. So accepting duplication is really nothing more than accepting the risk that someone else will have to deal with it later if it happens.

Note

Your code will change. Really.

Duplicates of Entire Files Should Be Allowed as Backups

“We are keeping copies of entire files in our codebase as backups. Every backup is an unavoidable duplicate of all other versions.”

We recommend keeping backups, but not in the way implied by this objection (inside the codebase). Version control systems such as SVN and Git provide a much better backup mechanism. If those are not available, move backup files to a directory next to the root of the codebase, not inside it. Why? Because sooner or later you will lose track of which variant of a file is the right one.

Unit Tests Are Covering Me

“Unit tests will sort out whether something goes wrong with a duplicate.”

This is true only if the duplicates are in the same method, and the unit test of the method covers both. If the duplicates are in other methods, it can be true only if a code analyzer alerts you if duplicates are changing. Otherwise, unit tests would not necessarily signal that something is wrong if only one duplicate has changed. Hence, you cannot rely only on the tests (identifying symptoms) instead of addressing the root cause of the problem (using duplicate code). You should not assume that eventual problems will be fixed later in the development process, when you could avoid them altogether right now.

Duplication in String Literals Is Unavoidable and Harmless

“I need long string literals with a lot of duplication in them. Duplication is unavoidable and does not hurt because it is just in literals.”

This is a variant of one of the objections discussed in Chapter 2 (“This unit is impossible to split”). We often see code that contains long SQL queries or XML or HTML documents appearing as string literals in Java code. Sometimes such literals are complete clones, but more often parts of them are repeated. For instance, we have seen SQL queries of more than a hundred lines of code that differed only in the sorting order (order by asc versus order by desc). This type of duplication is not harmless even though technically they are not in the Java logic itself. It is also not unavoidable; in fact this type of duplication can be avoided in a straightforward fashion:

Extract to a method that uses string concatenation and parameters to deal with variants.
Use a templating engine to generate HTML output from smaller, nonduplicated fragments that are kept in separate files.

Lines of code categorized as …	Percentage allowed for 4 stars
… nonredundant	At least 95.4%
… redundant	At most 4.6%

Building Maintainable Software, Java Edition by Joost Visser, Sylvan Rigal, Rob van der Leek, Pascal van Eck, Gijs Wijnholds

Chapter 4. Write Code Once

Guideline:

Important

Types of Duplication

Motivation

Duplicated Code Is Harder to Analyze

Important

Duplicated Code Is Harder to Modify

How to Apply the Guideline

The Extract Superclass Refactoring Technique

Common Objections to Avoiding Code Duplication

Copying from Another Codebase Should Be Allowed

Slight Variations, and Hence Duplication, Are Unavoidable

This Code Will Never Change

Note

Duplicates of Entire Files Should Be Allowed as Backups

Unit Tests Are Covering Me

Duplication in String Literals Is Unavoidable and Harmless

See Also

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly