Introduction to Duper

I have loads of duplicate files littering my computers. In an effort to tame this, let’s write a duplicate-file finder. Well call it Duper (so we can later create a paid version called SuperDuper). It’ll work by scanning all the files in a directory tree, calculating a hash for each. If two files have the same hash, we’ll report them as duplicates.

Let’s start asking the questions.

Q1: What is the environment and what are its constraints?

We’re going to run this on a typical computer. It’ll have roughly two orders of magnitude more file storage than main memory. Files will range in size from 100 to 1010 bytes, and there will be roughly 107 of them.

What this means:

We need to allow for the fact that although we have to load ...

Get Programming Elixir ≥ 1.6 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.