Software Tools Principles

Over the course of time, a set of core principles developed for designing and writing software tools. You will see these exemplified in the programs used for problem solving throughout this book. Good software tools should do the following things:

Do one thing well

In many ways, this is the single most important principle to apply. Programs that do only one thing are easier to design, easier to write, easier to debug, and easier to maintain and document. For example, a program like grep that searches files for lines matching a pattern should not also be expected to perform arithmetic.

A natural consequence of this principle is a proliferation of smaller, specialized programs, much as a professional carpenter has a large number of specialized tools in his toolbox.

Process lines of text, not binary

Lines of text are the universal format in Unix. Datafiles containing text lines are easy to process when writing your own tools, they are easy to edit with any available text editor, and they are portable across networks and multiple machine architectures. Using text files facilitates combining any custom tools with existing Unix programs.

Use regular expressions

Regular expressions are a powerful mechanism for working with text. Understanding how they work and using them properly simplifies your script-writing tasks.

Furthermore, although regular expressions varied across tools and Unix versions over the years, the POSIX standard provides only two kinds of regular expressions, with standardized library routines for regular-expression matching. This makes it possible for you to write your own tools that work with regular expressions identical to those of grep (called Basic Regular Expressions or BREs by POSIX), or identical to those of egrep (called Extended Regular Expressions or EREs by POSIX).

Default to standard I/O

When not given any explicit filenames upon which to operate, a program should default to reading data from its standard input and writing data to its standard output. Error messages should always go to standard error. (These are discussed in Chapter 2.) Writing programs this way makes it easy to use them as data filters—i.e., as components in larger, more complicated pipelines or scripts.

Don't be chatty

Software tools should not be "chatty." No starting processing, almost done, or finished processing kinds of messages should be mixed in with the regular output of a program (or at least, not by default).

When you consider that tools can be strung together in a pipeline, this makes sense:

tool_1 < datafile | tool_2 | tool_3 | tool_4 > resultfile

If each tool produces "yes I'm working" kinds of messages and sends them down the pipe, the data being manipulated would be hopelessly corrupted. Furthermore, even if each tool sends its messages to standard error, the screen would be full of useless progress messages. When it comes to tools, no news is good news.

This principle has a further implication. In general, Unix tools follow a "you asked for it, you got it" design philosophy. They don't ask "are you sure?" kinds of questions. When a user types rm somefile, the Unix designers figured that he knows what he's doing, and rm removes the file, no questions asked.[5]

Generate the same output format accepted as input

Specialized tools that expect input to obey a certain format, such as header lines followed by data lines, or lines with certain field separators, and so on, should produce output following the same rules as the input. This makes it easy to process the results of one program run through a different program run, perhaps with different options.

For example, the netpbm suite of programs[6] manipulate image files stored in a Portable BitMap format.[7] These files contain bitmapped images, described using a well-defined format. Each tool reads PBM files, manipulates the contained image in some fashion, and then writes a PBM format file back out. This makes it easy to construct a simple pipeline to perform complicated image processing, such as scaling an image, then rotating it, and then decreasing the color depth.

Let someone else do the hard part

Often, while there may not be a Unix program that does exactly what you need, it is possible to use existing tools to do 90 percent of the job. You can then, if necessary, write a small, specialized program to finish the task. Doing things this way can save a large amount of work when compared to solving each problem fresh from scratch, each time.

Detour to build specialized tools

As just described, when there just isn't an existing program that does what you need, take the time to build a tool to suit your purposes. However, before diving in to code up a quick program that does exactly your specific task, stop and think for a minute. Is the task one that other people are going to need done? Is it possible that your specialized task is a specific case of a more general problem that doesn't have a tool to solve it? If so, think about the general problem, and write a program aimed at solving that. Of course, when you do so, design and write your program so it follows the previous rules! By doing this, you graduate from being a tool user to being a toolsmith, someone who creates tools for others!



[5] For those who are really worried, the -i option to rm forces rm to prompt for confirmation, and in any case rm prompts for confirmation when asked to remove suspicious files, such as those whose permissions disallow writing. As always, there's a balance to be struck between the extremes of never prompting and always prompting.

[6] The programs are not a standard part of the Unix toolset, but are commonly installed on GNU/Linux and BSD systems. The WWW starting point is http://netpbm.sourceforge.net/. From there, follow the links to the Sourceforge project page, which in turn has links for downloading the source code.

[7] There are three different formats; see the pnm(5) manpage if netpbm is installed on your system.

Get Classic Shell Scripting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.