Processing Fixed-Length Records
Problem
You need to read and process data that is in a fixed-length (also called fixed-width) form.
Solution
Use Perl or gawk 2.13 or greater. Given a file like:
$ cat fixed-length_file Header1-----------Header2-------------------------Header3--------- Rec1 Field1 Rec1 Field2 Rec1 Field3 Rec2 Field1 Rec2 Field2 Rec2 Field3 Rec3 Field1 Rec3 Field2 Rec3 Field3
You can process it using GNU’s gawk, by
setting FIELDWIDTHS
to
the correct field lengths, setting OFS
as desired, and making an assignment so
gawk rebuilds the record (see the
awk trick in Trimming Whitespace). However, gawk
does not remove the spaces used in padding the original record, so we
use two gsubs to do that, one for all the internal
fields and the other for the last field in each record. Finally, we just
print. Note the → denotes a literal tab character in the output. The
output is a little hard to read, so there is a hex dump as well. Recall
that ASCII tab is 09
while ASCII
space is 20
.
$ gawk ' BEGIN { FIELDWIDTHS = "18 32 16"; OFS = "\t" } { $1 = $1; gsub(/ +\t/, "\ t"); gsub(/ +$/, ""); print }' fixed-length_file Header1----------- → Header2------------------------- → Header3--------- Rec1 Field1 → Rec1 Field2 → Rec1 Field3 Rec2 Field1 → Rec2 Field2 → Rec2 Field3 Rec3 Field1 → Rec3 Field2 → Rec3 Field3 $ gawk ' BEGIN { FIELDWIDTHS = "18 32 16"; OFS = "\t" } { $1 = $1; gsub(/ +\t/, "\ t"); gsub(/ +$/, ""); print }' fixed-length_file | hexdump -C 00000000 48 65 61 64 65 72 31 2d 2d 2d 2d ...
Get bash Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.