traverse function gives us more control than our original
betterFind function, it still has a
significant failing: we can avoid recursing into directories, but we
can’t filter other names until after we’ve generated the entire list of
names in a tree. If we are traversing a directory containing 100,000
files of which we care about only 3, we’ll allocate a 100,000-element
list before we have a chance to trim it down to the 3 we really
One approach would be to provide a filter
function as a new argument to
traverse, which we would apply to the list of
names as we generate it. This would allow us to allocate a list of only
as many elements as we need.
However, this approach also has a weakness. Say we know that we want at most 3 entries from our list, and that those 3 entries happen to be the first 3 of the 100,000 that we traverse. In this case, we’ll needlessly visit 99,997 other entries. This is not by any means a contrived example: for instance, the Maildir mailbox format stores a folder of email messages as a directory of individual files. It’s common for a single directory representing a mailbox to contain tens of thousands of files.
We can address the weaknesses of our two prior traversal functions by taking a different perspective: what if we think of filesystem traversal as a fold over the directory hierarchy?
The familiar folds,
foldl', neatly generalize the idea of traversing a list while accumulating a result. It’s ...