Comparing Directory Trees

Engineers can be a paranoid sort (but you didn’t hear that from me). At least I am. It comes from decades of seeing things go terribly wrong, I suppose. When I create a CD backup of my hard drive, for instance, there’s still something a bit too magical about the process to trust the CD writer program to do the right thing. Maybe I should, but it’s tough to have a lot of faith in tools that occasionally trash files, and seem to crash my Windows 98 machine every third Tuesday of the month. When push comes to shove, it’s nice to be able to verify that data copied to a backup CD is the same as the original -- or at least spot deviations from the original -- as soon as possible. If a backup is ever needed, it will be really needed.

Because data CDs are accessible as simple directory trees, we are once again in the realm of tree walkers -- to verify a backup CD, we simply need to walk its top-level directory. We’ve already written a generic walker class (the visitor module), but it won’t help us here directly: we need to walk two directories in parallel and inspect common files along the way. Moreover, walking either one of the two directories won’t allow us to spot files and directories that only exist in the other. Something more custom seems in order here.

Finding Directory Differences

Before we start coding, the first thing we need to clarify is what it means to compare two directory trees. If both trees have exactly the same branch structure and depth, this ...

Get Programming Python, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.