Sorting Large DBM Files

Problem

You want to process a large dataset you’d like to commit to a DBM file in a particular order.

Solution

Use the DB_File’s B-tree bindings and supply a comparison function of your own devising:

use DB_File;

# specify the Perl sub to do key comparison using the
# exported $DB_BTREE hash reference
$DB_BTREE->{'compare'} = sub {
    my ($key1, $key2) = @_ ;
    "\L$key1" cmp "\L$key2" ;
};

tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE)
    or die "can't tie $filename: $!";

Description

An annoyance of hashes, whether in memory or as DBM files, is that they do not maintain proper ordering. The CPAN module Tie::IxHash can make a regular hash in memory maintain its insertion order, but that doesn’t help you for DBM databases or arbitrary sorting criteria.

The DB_File module supports a nice solution to this using a B-tree implementation. One advantage of a B-tree over a regular DBM hash is its ordering. When the user defines a comparison function, all calls to keys, values, and each are automatically ordered. For example, Example 14.4 is a program that maintains a hash whose keys will always be sorted case-insensitively.

Example 14-4. sortdemo

#!/usr/bin/perl
#  sortdemo - show auto dbm sorting use strict; use DB_File; $DB_BTREE->{'compare'} = sub { my ($key1, $key2) = @_ ; "\L$key1" cmp "\L$key2" ; }; my %hash; my $filename = '/tmp/sorthash.db'; tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE) or die "can't tie $filename: $!"; my $i = 0; ...

Get Perl Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.