Extracting and Rearranging Datafile Columns

Problem

You want to pull out columns from a datafile or rearrange them into a different order.

Solution

Use a utility that can produce columns from a file on demand.

Discussion

cvt_file.pl serves as a tool that converts entire files from one format to another. Another common datafile operation is to manipulate its columns. This is necessary, for example, when importing a file into a program that doesn’t understand how to extract or rearrange input columns for itself. To work around this problem, you can rearrange the datafile instead.

Recall that this chapter began with a description of a scenario involving a 12-column CSV file somedata.csv from which only columns 2, 11, 5, and 9 were needed. You can convert the file to tab-delimited format like this:

%cvt_file.pl --iformat=csv somedata.csv > somedata.txt

But then what? If you just want to knock out a short script to extract those specific four columns, that’s fairly easy: write a loop that reads input lines and writes only the columns you want in the proper order. But that would be a special-purpose script, useful only within a highly limited context. With just a little more effort, it’s possible to write a more general utility yank_col.pl that enables you to extract any set of columns. With such a tool, you’d specify the column list on the command line like this:

%yank_col.pl --columns=2,11,5,9 somedata.txt > tmp.txt

Because the script doesn’t use a hardcoded column list, it can be used to pull ...

Get MySQL Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.