Extracting the First and Last Lines

It is sometimes useful to extract just a few lines from a text file—most commonly, lines near the beginning or the end. For example, the chapter titles for the XML files for this book are all visible in the first half-dozen lines of each file, and a peek at the end of job-log files provides a summary of recent activity.

Both of these operations are easy. You can display the first n records of standard input or each of a list of command-line files with any of these:

head -n n      [ file(s) ]

head -n        [ file(s) ]

awk 'FNR <= n' [ file(s) ]

sed -e nq      [ file(s) ]

sed nq         [ file(s) ]

POSIX requires a head option of -n 3 instead of -3, but every implementation that we tested accepts both.

When there is only a single edit command, sed allows the -e option to be omitted.

It is not an error if there are fewer than n lines to display.

The last n lines can be displayed like this:

tail -n n      [ file ]

tail -n        [ file ]

As with head, POSIX specifies only the first form, but both are accepted on all of our systems.

Curiously, although head handles multiple files on the command line, traditional and POSIX tail do not. That nuisance is fixed in all modern versions of tail.

In an interactive shell session, it is sometimes desirable to monitor output to a file, such as a log file, while it is still being written. The -f option asks tail to show the specified number of lines at the end of the file, and then to go into an endless loop, sleeping for a second before waking up and checking for more output to display. With -f, tail terminates only when you interrupt it, usually by typing Ctrl-C:

$ tail -n 25 -f /var/log/messages        
            Watch the growth of the system message log
...
^C                                       Ctrl-C stops tail

Since tail does not terminate on its own with the -f option, that option is unlikely to be of use in shell scripts.

There are no short and simple alternatives to tail with awk or sed, because the job requires maintaining a history of recent records.

Although we do not illustrate them in detail here, there are a few other commands that we use in small examples throughout the book, and that are worth adding to your toolbox:

  • dd copies data in blocks of user-specified size and number. It also has some limited ability to convert between uppercase and lowercase, and between ASCII and EBCDIC. For character-set conversions, however, the modern, and POSIX-standard, iconv command for converting files from one code set to another has much more flexibility.

  • file matches a few selected leading bytes of the contents of each of its argument files against a pattern database and prints a brief one-line report on standard output of its conclusions for each of them. Most vendor-provided implementations of file recognize 100 or so types of files, but are unable to classify binary executables and object files from other Unix flavors, or files from other operating systems. There is a much better open-source version,[9] however, that has enjoyed the benefits of many contributors: it can recognize more than 1200 file types, including many from non-Unix operating systems.

  • od , the octal dump command, prints byte streams in ASCII, octal, and hexadecimal. Command-line options can set the number of bytes read and can select the output format.

  • strings searches its input for sequences of four or more printable characters ending with a newline or a NUL, and prints them on standard output. It is often useful for peeking inside binary files, such as compiled programs or datafiles. Desktop-software, image, and sound files sometimes contain useful textual data near the beginning, and GNU head provides the handy -c option to limit the output to a specified number of characters:

$ strings -a horne01.jpg | head -c 256 | fmt -w 65  
            Examine astronomical image
JFIF Photoshop 3.0 8BIM Comet Hale-Bopp shows delicate
filaments in it's blue ion tail in this exposure made Monday
morning 3/17/97 using  12.5 inch F/4 Newtonian reflecting
telescope. The 15 minute exposure was made on Fujicolor SG-800
Plus film. 8BIM 8BI

Get Classic Shell Scripting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.