Reformatting Paragraphs

Most powerful text editors provide commands that make it easy to reformat paragraphs by changing line breaks so that lines do not exceed a width that is comfortable for a human to read; we used such commands a lot in writing this book. Sometimes you need to do this to a data stream in a shell script, or inside an editor that lacks a reformatting command but does have a shell escape. In this case, fmt is what you need. Although POSIX makes no mention of fmt, you can find it on every current flavor of Unix; if you have an older system that lacks fmt, simply install the GNU coreutils package.

Although some implementations of fmt have more options, only two find frequent use: -s means split long lines only, but do not join short lines to make longer ones, and -w n sets the output line width to n characters (default: usually about 75 or so). Here are some examples with chunks of a spelling dictionary that has just one word per line:

$ sed -n -e 9991,10010p /usr/dict/words | fmt        
            Reformat 20 dictionary words
Graff graft graham grail grain grainy grammar grammarian grammatic
granary grand grandchild grandchildren granddaughter grandeur grandfather
grandiloquent grandiose grandma grandmother

$ sed -n -e 9995,10004p /usr/dict/words | fmt -w 30  
            Reformat 10 words into short lines
grain grainy grammar
grammarian grammatic
granary grand grandchild
grandchildren granddaughter

If your system does not have /usr/dict/words, then it probably has an equivalent file named /usr/share/dict/words or /usr/share/lib/dict/words.

The split-only option, -s, is helpful in wrapping long lines while leaving short lines intact, and thus minimizing the differences from the original version:

$ fmt -s -w 10 << END_OF_DATA                        
            Reformat long lines only
> one two three four five
> six
> seven 
> eight
> END_OF_DATA
one two
three
four five
six
seven
eight

Warning

You might expect that you could split an input stream into one word per line with fmt -w 0, or remove line breaks entirely with a large width. Unfortunately, fmt implementations vary in behavior:

  • Older versions of fmt lack the -w option; they use - n to specify an n-character width.

  • All reject a zero width, but accept -w 1 or -1.

  • All preserve leading space.

  • Some preserve lines that look like mail headers.

  • Some preserve lines beginning with a dot (troff typesetter commands).

  • Most limit the width. We found peculiar upper bounds of 1021 (Solaris), 2048 (HP/UX 11), 4093 (AIX and IRIX), 8189 (OSF/1 4.0), 12285 (OSF/1 5.1), and 2147483647 (largest 32-bit signed integer: FreeBSD, GNU/Linux, and Mac OS).

  • The NetBSD and OpenBSD versions of fmt have a different command-line syntax, and apparently allocate a buffer to hold the output line, since they give an out of memory diagnostic for large width values.

  • IRIX fmt is found in /usr/sbin, a directory that is unlikely to be in your search path.

  • HP/UX before version 11.0 did not have fmt.

These variations make it difficult to use fmt in portable scripts, or for complex reformatting tasks.

Get Classic Shell Scripting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.