Processing Fixed-Length Records

Problem

You need to read and process data that is in a fixed-length (also called fixed-width) form.

Solution

Use Perl or gawk 2.13 or greater. Given a file like:

$ cat fixed-length_file
Header1-----------Header2-------------------------Header3---------
Rec1 Field1       Rec1 Field2                     Rec1 Field3
Rec2 Field1       Rec2 Field2                     Rec2 Field3
Rec3 Field1       Rec3 Field2                     Rec3 Field3

You can process it using GNU’s gawk, by setting FIELDWIDTHS to the correct field lengths, setting OFS as desired, and making an assignment so gawk rebuilds the record (see the awk trick in Trimming Whitespace). However, gawk does not remove the spaces used in padding the original record, so we use two gsubs to do that, one for all the internal fields and the other for the last field in each record. Finally, we just print. Note the → denotes a literal tab character in the output. The output is a little hard to read, so there is a hex dump as well. Recall that ASCII tab is 09 while ASCII space is 20.

$ gawk ' BEGIN { FIELDWIDTHS = "18 32 16"; OFS = "\t" } { $1 = $1; gsub(/ +\t/, "\ t"); gsub(/ +$/, ""); print }' fixed-length_file Header1----------- → Header2------------------------- → Header3--------- Rec1 Field1 → Rec1 Field2 → Rec1 Field3 Rec2 Field1 → Rec2 Field2 → Rec2 Field3 Rec3 Field1 → Rec3 Field2 → Rec3 Field3 $ gawk ' BEGIN { FIELDWIDTHS = "18 32 16"; OFS = "\t" } { $1 = $1; gsub(/ +\t/, "\ t"); gsub(/ +$/, ""); print }' fixed-length_file | hexdump -C 00000000 48 65 61 64 65 72 31 2d 2d 2d 2d ...

Get bash Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.