Recognizing Two Names for the Same File

Problem

You want to identify if two filenames in a list correspond to the same file on disk (because of hard and soft links, two filenames can refer to a single file). You might do this to make sure that you don’t change a file you’ve already worked with.

Solution

Maintain a hash, keyed by the device and inode number of the files you’ve seen. The values are the names of the files:

%seen = ();

sub do_my_thing {
    my $filename = shift;
    my ($dev, $ino) = stat $filename;

    unless ($seen{$dev, $ino}++) {
        # do something with $filename because we haven't
        # seen it before
    }
}

Discussion

A key in %seen is made by combining the device number ($dev) and inode number ($ino) of each file. Files that are the same will have the same device and inode numbers, so they will have the same key.

If you want to maintain a list of all files of the same name, instead of counting the number of times seen, save the name of the file in an anonymous array.

foreach $filename (@files) {
    ($dev, $ino) = stat $filename;
    push( @{ $seen{$dev,$ino} }, $filename);
}

foreach $devino (sort keys %seen) {
    ($dev, $ino) = split(/$;/o, $devino);
    if (@{$seen{$devino}} > 1) {
        # @{$seen{$devino}} is a list of filenames for the same file
    }
}

The $; variable contains the separator string using the old multidimensional associative array emulation syntax, $hash{$x,$y,$z}. It’s still a one-dimensional hash, but it has composite keys. The key is really join($; => $x, $y, $z). The split separates them ...

Get Perl Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.