Chapter 4. Organizing and Naming Files and Folders

In this chapter and the next, I present strategies for organizing and storing your digital assets. I find it’s very helpful to break this discussion into two topics: the directory structure and the hardware configuration. The directory structure is the organization of files into a hierarchy of folders. It should be stable and scalable; it should not fundamentally change, only be added to. The hardware configuration is simply the configuration of the media that happen to be hosting the directory structure at any given time, and this will change as you add storage or move to entirely new media.

Organizing Your Image Files: An Overview
Working File Directory Structure
Archive Directory Structure: An Overview
Archive Directory Structure: Making Buckets
The Fine Art of File Naming

Organizing Your Image Files: An Overview

In the previous chapter, we saw you can use metadata to organize your images in robust and flexible ways. In this chapter, we’ll see how to name your files and how to divide them into a folder structure that takes the data lifecycle into account. Let’s start with a basic overview of the organizational structure and tools, and then move to some specific recommendations.

Folders: What Are They Good For?

In the physical filing world, folders are very useful for organizing lots of documents. Since a piece of physical film or paper can be in only one place at a time, you can create a highly structured filing system using folders. Computers have used the folder metaphor for decades as a way to help users keep track of where something can be found.

Folders are very limited as an organizational structure, however. There must be one top-level organizational construct, which can only be subdivided in a limited way before the system becomes too cumbersome and breaks down. Is it most important to divide by date, client, project, subject matter, rating, or usage? Furthermore, information that is dependent on folder structure is very fragile. If you remove an image from a folder that designates what that image is, that content information can be lost (Figure 4-1).

One of the biggest problems with using folders for organization is that as soon as you remove the image from the folder the information is removed from the image.

Figure 4-1. One of the biggest problems with using folders for organization is that as soon as you remove the image from the folder the information is removed from the image.

However, folders provide an ideal tool for managing the data itself. I suggest you should use folders principally for storage, rather than for organization. By storage, I mean containing the images, putting them away, moving them around, and other handling issues, distinct from the organization of the images that is best accomplished by using metadata. In the physical world, storage and organizational structure are often inseparable. In the digital world, we are not so constrained. We’ll see how you can use folders in a simple, straightforward way to stack files up so that you can back them up, validate them, and restore them in the event of a problem. This does not mean that folder naming is irrelevant as a content-organizing tool; it means that content organizing is a secondary job.

As we saw in the previous chapter, metadata does the heavy lifting for organizational purposes. Letting go of a filing hierarchy based on image content can be difficult, but once you surrender yourself to the concept that the directory structure does not have to be the ultimate keeper of important content data about your images, a very simple preferred data structure emerges—a structure that leverages the particular qualities of digital media. It saves time, reduces confusion, and will really pay off when you hit one of those practically inevitable bumps down the road (such as drive failure or format migration).

Metadata and cataloging software offer much more flexibility and efficiency for organizing than folders do. Cataloging software allows us to organize the same set of files in numerous ways. You can view an entire archive according to multiple criteria, including date created, commissioning client, portfolio images, quality rating, subject pictured, and, yes, where it is stored.

Cataloging software frees us from having to use the directory structure to identify the content of our images. This is a good thing, because you should organize the directory structure with other issues in mind:

It should be scalable in a way that will meet the challenges of decades of storage.
It should be simple: quick to scan and quick to add to.
It should be designed so that it is easy to determine if a file has been backed up, and easy to restore in the event of a problem.

Digging Deeper

Folders Are Metadata, Too

Folders are metadata on two levels, one more obvious than the other:

The directory structure that your images are stored within becomes metadata to your cataloging software, and is stored alongside keywords, file size, and other data associated with the file. It’s just another piece of information that the catalog remembers about the files.
The folder you see on your screen is really a metaphor presented to you by the operating system. The bits of the file aren’t really in a folder on the hard drive—in fact, they’re probably not even grouped together on the drive at all. The computer’s file system contains a volume record that remembers where all the bits of each file have been saved to the drive. The folders that you see are just a way to help you make sense of the files that are in the filesystem; they are a representation of the data—metadata.

You Might Be Wondering ...

You Keep Using the Word “Scalable”—What’s Up with That?

It’s not a fish thing, and it’s not about how much you weigh. The term scalable is an IT-geek term referring to the ability of a system to grow with you. In fact, it’s not hard to imagine how the growth of your digital archives could make many of the tools you currently use obsolete.

The most obvious tool that will break with archive growth is the comprehensiveness of your “virtual metadata” (my term for a photographer’s memory about the contents of her images). While it may be easy to remember what’s in your archive when there are only a few thousand recent images, this will become much more difficult when you’ve amassed a 20-year archive of one million photographs.

You may also hit scalability limits with numbering systems, storage configurations, and software capability. Everything I present here has been designed with an eye toward the inevitable scaling challenges presented by long-term archiving.

Consistent Directory Structure Is Your Friend

My word processing documents are organized so that I can move this folder of subfolders to a new machine and keep my files organized in exactly the same way.

Figure 4-2. My word processing documents are organized so that I can move this folder of subfolders to a new machine and keep my files organized in exactly the same way.

You have probably upgraded your computer before, and had to transfer your saved email files or word processing documents from one machine to another. If you had all your documents in a folder—or perhaps a folder full of subfolders, as shown in Figure 4-2—this part of the upgrade process was probably not too difficult because all you had to do was copy the parent folder from one machine to another. The directory structure (the way the files are stored) stayed constant, even though it was moved to an entirely new data storage device (the hard drive in the new computer).

You’ll want to structure your image storage system so that you can move it as easily as your folder full of documents, and you’ll want to be able to find things when you’re done. The directory structure for your image files should be simple, unified, and scalable, and it should be independent of the storage medium on which it happens to be hosted. Essentially, what we will be designing in this chapter is a structure to hold all your image files and a way to name the files within that structure.

You Might Be Wondering ...

What If I’ve Got an IT Department to Handle Data Management?

The structure described in this chapter is geared to people who want to create and maintain an archive, but who do not have an IT department to handle the data management. The system described here integrates a scalable and secure storage method with hardware that is affordable for the individual or small business.

If you have an IT department that can handle the data that your digital photography creates, then some of what’s described here is not necessary. Write-once media buckets, in particular, are probably not necessary. You have more freedom to configure the directory structure any way you want, since you won’t need to leverage it to help you back up and restore the archive—that’s someone else’s job.

In consulting with a number of companies, however, I have found that most of the time IT departments (even very good ones) are not equipped to deal with the large amount of data created by digital photography. A digital media collection may be many times larger than all other data in the company combined. In some of these cases, the photography group in the company has implemented a system like the one I describe in this book, running entirely outside the regular IT structure.

Designing a Unified Directory Structure

The most basic directory structure—an alphabetized list of folders, for instance—is fine for small text files, but when you try to apply this structure to a large data set of digital photographs, it breaks down. For example, it can break down when the image collections grows too large for one volume. Additionally, the alphabetized directory structure mixes the old and new work in ways that are hard to back up and restore reliably. In order to design a unified directory structure for a digital photography archive, we have to consider a number of factors:

What kinds of files go together?
When are images added to the permanent structure?
How do you configure the structure to be scalable?
How does the structure relate to the storage medium?
How do we back up, validate, and restore files?

We’re going to use the tools of data lifecycle management outlined in Chapter 1 to help configure a directory structure that addresses these questions. Ironically, in order to build a unified structure, we must first look at how to divide up the kinds of data we need to store.

You Might Be Wondering ...

What’s an Original and What’s a Derivative File?

In the world of film-based imaging, these terms were clear: the film that went through the camera was the original, and any print, scan, or film duplicate of the original was a derivative file (of course, we didn’t use that term—a derivative was simply called a print, a scan, or a dupe).

In the world of digital imaging, it’s a little different. Since you can make an unlimited number of perfect copies of an image file, we need a new way to think about successive copies of files. A copy of the original file that is unchanged in any way is actually a second original file, and it can also be referred to as an original. A copy of the file that has been manipulated and no longer contains the original information in its original form, on the other hand, is a derivative file.

At this point, you may be wondering about raw files that have been converted to DNG files. Are they not manipulated versions of the original that don’t contain all the original image information in the original form? While it’s true that the format has changed, all the original image information is still there, so the DNG becomes the functional original—henceforth referred to as the original. As described in Chapter 2, DNG files can be manipulated again and again as though they were the raw originals. You can think of them as original image files that have been placed in a job jacket with some paperwork and a pretty good print of the image.

A JPEG file that is a camera original is a slightly different story. To remain as an original, the image data must be unchanged. You can duplicate the file, and it will remain an original. But if you open and resave the file—even without editing it—you will reapply the JPEG compression and degrade the image.

There are only a few things that you can do to an original JPEG file that do not adversely affect the image quality. These tasks have to do with metadata. Most DAM programs have the ability to add or replace metadata in the file without opening and resaving it. Many also have the ability to “losslessly rotate” a JPEG (also called soft rotation). This is also done with metadata, but a more obscure kind: there is a field in the header of the file that you can use to instruct any program that is displaying or opening the file to rotate it, and you can change this header information without altering the image data. A JPEG that has been opened, rotated, and resaved is not an original file, but a JPEG that has been losslessly rotated by changing the header data can be considered an original.

Build Your System for the Data Lifecycle

I suggest making several divisions in your storage system to reduce costs and increase security (Figure 4-3). In this chapter, we’re going to look at the working life-cycle phase, as well as the archive phase (ingestion is part of the working phase as far as directory structure is concerned). As we will see in Chapter 6, there are some important file handling differences between working and archive files. Working files need a more rigorous (and probably more expensive) backup system, since the files are changing rapidly. Archive files have reached a more stable state and can be protected by a more secure and cost-effective backup routine. We want the directory structure to reflect this difference and help facilitate the processes involved.

Here is a way to organize the directories for several stages of the images’ lives. Images are downloaded to Working files, archived as original files, and then archived as derivatives, if needed.

Figure 4-3. Here is a way to organize the directories for several stages of the images’ lives. Images are downloaded to Working files, archived as original files, and then archived as derivatives, if needed.

Separate Originals and Derivative Files

For many users, it will make the most sense to separate original files from derivatives. The most basic reason for this is that derivatives are often made after (sometimes long after) you have archived the originals. In fact you may rework your most valuable images several times over their lifetime. While it’s natural to want to save these files back into the folder with the originals, this can really mess up the backup, validation, and restoration of the archive. We’ll look more closely at these issues in Chapter 6.

For the purposes of this discussion, let’s assume that the original is the image produced by a digital camera, and that a derivative is a TIFF or PSD file produced by some kind of Photoshop work that can’t be done in PIEware—in other words, something that has to be saved in a new, differently formatted file.

Working File Directory Structure

The working part of the directory structure is a place to keep your works in progress. Because of the large size of the data, I suggest handling it differently from your word processor documents, email messages, and other smaller files that are in your users folder. I suggest creating a Working folder in the root directory of the drive, as shown in Figure 4-4. Inside this folder, you will keep images that have not yet been archived, as well as your image catalogs, master files in progress, any movie or audio files you’re working on, and so on. In Chapter 6 we’ll look at how to protect this data, and the reason for keeping it separate will become clearer.

Figure 4-4. A Working folder in the root directory of the hard drive. I keep my imaging works in progress in here.

An Image Workflow Pipeline

As images are downloaded, processed, and prepared for archive, you can use a folder structure to track and manage the files. There are several advantages to setting up a dedicated space for downloading and preparing your camera originals. First, it creates an easy-to-find space to target your download and to find the images later. Using a working files folder that is the same on every computer can also help you if you use both a laptop and a desktop. By keeping the folder structure standardized, you make it a lot easier to move back and forth between machines. And since these images have not been archived and gotten full backups, we’ll want to take an extra measure of precaution to protect them.

Figure 4-5. A simple image work-flow pipeline. We’ll look at other configurations in later chapters.

We’ll look at the image workflow pipeline in great detail in Chapter 8–Chapter 10, but outlined below are the basic components. I use these numbered folders as a way to track the image on the way from card to archive (Figure 4-5). The folder arrangement shown here is only one way that you can configure the pipeline; I’ll outline other configurations that you might want to use in the workflow chapters.

The ingestion process drops renamed images into the #1 folder with bulk metadata attached.
Once you have rated and adjusted them, images go into the #2 folder, where they await conversion to DNG.
The DNG files land in the #3 folder after conversion.
DNG files are moved into the #4 folder, where they are put into buckets and await transfer to the archive.
After transfer, images are moved to the #5 folder where they stay until the new archive copy is backed up. They can then be erased.

You Might Be Wondering

Shouldn’t I Use My Pictures Folder for My Pictures?

Both the Mac and the PC use the concept of a users folder to keep documents and settings for a particular user together. This enables multiple people to use the same computer and have things customized the way each of them likes it. Most programs expect that you will put your documents and pictures somewhere in your users folder.

Instead, I like to use the root directory of the hard drive to store my image documents (the root directory is the top level, inside the Macintosh HD or the C: drive). I do this for several reasons. The first is that my image documents get a separate backup from my user documents like email messages, program settings, and word processing documents. The image documents are much larger than the text documents, and have a different set of backup needs. Lightroom catalogs, in particular, should not be backed up in the same way as text documents; this would slow the backup dramatically, and would often result in a bad copy of the Lightroom catalog (if the catalog was open when backed up).

Other Working Files

You’ll notice that there are some other files in the Working folder shown in Figure 4-6—Expression Media catalogs, Lightroom catalogs, a drop folder, derivatives in progress, and movie files. I keep all of these in this special folder so that I can back the files up with an easy automated program. If you are a Time Machine user, you may want to exclude this folder from backup, since large volumes of media files and Lightroom catalogs play havoc with that software. We’ll talk more about backups in Chapter 6.

Figure 4-6. I keep other kinds of works in progress in the Working folder, such as catalogs, movies, and audio files in progress.

Archive Directory Structure: An Overview

As your collection of digital images grows, a larger percentage of it will be archive files rather than working files. When you start shooting digital, none of your files are archive files. By the time you have been shooting digital for a decade, your collection will be over 95% archive files. This, of course, is similar to film archives, where only a small portion of your photography collection is new—most of it will be archived images.

The archive should be an orderly and secure permanent home for your images. Let’s take a look at how you can structure your archive to achieve that goal.

You Might Be Wondering ...

What Makes a File Ready to Archive?

I know it sounds obvious, but images should be put into the archive only when they are ready to be put there. For original raw files, it’s essential to make sure the images have their permanent names, and it’s best if they also have bulk metadata and maybe some kind of rating. Beyond that, there are two ways to approach it, depending on the software you use. If you are using a combination of ACR and Expression Media, for instance, you’ll want to make sure your files have a good enough “proofing correction” so that Expression can output from the DNG and give good results. If you are managing the archive with Lightroom or Bibble 5, you can archive the files before proofing adjustments, since the software that manages the files can also make custom image adjustments.

You should add original JPEG files to the archive only once you have renamed them and added basic metadata. Derivative files are generally ready to be put away once you have finished working with those particular versions of the files and are totally satisfied with them. By putting images away as soon as possible, you reduce clutter and confusion and simplify the backup process.

Segregating Original and Derivative Files

Figure 4-7 shows two hard drive directories: one for original files and one for derivative images. I suggest making a parent directory structure for each kind of image. This division will assist in streamlining the backup procedures, as well as maintaining file integrity and simplifying future file migration.

Separating original and derivative files makes archiving the images more straightforward because it allows you to archive entire shoots even you have not yet created all final master files.

Figure 4-7. Separating original and derivative files makes archiving the images more straightforward because it allows you to archive entire shoots even you have not yet created all final master files.

For those who are still thinking in analog terms, it may seem to make sense to keep all the Shoot X files together in the same folder and on the same drive. I suggest, however, that you segregate all files into groups of originals and derivatives. As software moves forward and you can accomplish more of your work with PIEware rendering tools, you’ll need to make fewer derivative files. What required Photoshop 2 years ago can now be done faster, nondestructively, and with a smaller file-size footprint with Lightroom or ACR. In many cases, you’ll want to revisit original images as the software improves.

One of the most intuitive reasons for keeping originals and derivatives together is the need to find one from the other. If folders are your main method of tracking and finding files, it’s important to keep them near each other. However, if you use catalog software to manage your collection, you can easily find multiple versions of the image without having to depend on folder structure to associate the files (Figure 4-8).

Figure 4-8. If you use unique file names, you can find a master file from an original pretty easily, particularly with catalog software.

Segregating originals and derivatives is a good way to organize your files, particularly from a file-handling point of view:

When the need arises to migrate the files (such as for the conversion of legacy raw files to DNG), it will be easier to batch everything on an entire drive, rather than having to migrate original raw files commingled with derivative images.
For images shot as in-camera JPEG files, it will be immediately clear which are the originals and which are the derivative files, so you won’t accidentally overwrite the originals.
If you are looking for a file, you will generally know before you start searching whether you are looking for an original file or a derivative one. Thus, your search will already be narrowed.
Most importantly, the conversion from original to derivative may not happen for quite some time, particularly for personal work. In this case, commingling file types complicates the backup process, as you will be scattering the new files among a much larger group of older, already backed-up raw files. If you build a directory structure that lumps together old and new files, your backup procedures will be much more expensive, complicated, and time-consuming or, alternately, less safe.

Note

The bucket system is designed to take advantage of some media storage techniques described in the next chapter and in Chapter 6. By using this system, you will be able to maximize the reliability, clarity, scalability, and ease of management of your archive system. Digital photography can seem like a never-ending spending treadmill, so I have developed techniques that enable you to get as much functionality and reliability as possible, as economically as possible.

Enter the Bucket Brigade

Now that we have divided our archive images into two main groups—originals and derivatives—we need to come up with an internal structure for those groups. As with all aspects of our archive, we will strive here for simplicity, durability, and scalability.

My system, which I call the bucket system (Figure 4-9), is a deceptively simple technique that addresses a number of the challenges that the digital photographer faces. Images are put away into sequential folders as they are prepared for archive. These sequential folders correspond to the write-once media component of the backup (CD, DVD, or Blu-ray). This structure makes it easy for you to determine that images have all been backed up, and it makes it easy for you to restore the archive in the event of drive failure.

Figure 4-9. Putting images in buckets is a handy way to organize them in a way that helps you address the challenges of long-term storage.

Here are some of the bucket system’s primary characteristics:

It’s simple and it lets you put images into their permanent homes quickly.
It’s visually orderly.
It’s scalable: you just add more buckets to add more images, and you can easily combine smaller buckets into larger buckets if you upgrade your storage media.
It provides an orderly progression of your image files from the recent archive to the deep archive as the images age.
It’s easy to confirm the existence of backups.
It’s easily restorable in the event of calamity.
It forms a file structure that’s well suited to migration through multiple data storage configurations over the life of the archive.

Okay, so what do these magical buckets look like? One of my original file storage drives is shown in Figure 4-10.

Archive Directory Structure: Making Buckets

At the heart of the bucket system is a very simple idea: put your images into folders that line up in numerical sequence. When a folder becomes full (as described below), you simply make a new folder with the next sequence number.

By structuring your directory in this way, you create a natural progression from old files to new. As you add new images to your collection, they show up at the end of the sequence, and older images are “pushed” deeper into the archive. This directory structure makes adding files easy and logical. It keeps your files organized, provides easy confirmation of backups, and simplifies the workflow.

I keep my original files in buckets that live on a series of drives labeled Original_01, Original_02, and Original_03. The Bluray-sized buckets of images live on the drive. Small assignments are grouped together in a bucket like 116_Misc. Large assignments, like the Africa pictures, can be spread across multiple buckets.

Figure 4-10. I keep my original files in buckets that live on a series of drives labeled Original_01, Original_02, and Original_03. The Bluray-sized buckets of images live on the drive. Small assignments are grouped together in a bucket like 116_Misc. Large assignments, like the Africa pictures, can be spread across multiple buckets.

Determining the Bucket Size

There are a number of strategies for determining what constitutes a bucket. To find the size that’s right for you, you’ll need to take into account how you shoot and what your storage medium is. We’ll start by looking at a system that employs optical disks, and then look at some other ways to set up your buckets.

Sizing your buckets according to your backup medium

I recommend using write-once media as part of your backup strategy. So that this works as cleanly as possible, each of your buckets should fit neatly onto your chosen storage media. As shown in Figure 4-11, this makes for an easy correlation between the original bucket and each of the backup copies (you will see how nicely this logic works in Chapter 6). Each bucket is a self-contained group of files, and it’s easy to see that a copy of each is present by its sequential organization.

Obviously, before you determine how much data to put into each folder, you should decide which backup media you will be using. When I used to back up my images on DVDs, I limited the size of each folder to what will fit comfortably on a DVD, which is about 4.3 GB of data. If you shoot only JPEGs, you may want to use a CD-sized bucket (about 700 MB of data).

The bucket system enables easy confirmation that you have backed up your image files, even if you are using different-sized storage devices—or different media entirely—for your backups. From left to right, these images represent the primary drive with the master original files, the hard drive backups of the originals (perhaps on smaller disks), and the second backup copies on optical media.

Figure 4-11. The bucket system enables easy confirmation that you have backed up your image files, even if you are using different-sized storage devices—or different media entirely—for your backups. From left to right, these images represent the primary drive with the master original files, the hard drive backups of the originals (perhaps on smaller disks), and the second backup copies on optical media.

When I changed from DVD to Blu-ray for my backup optical media, I combined existing buckets into new Blu-ray-sized ones. In this figure, you see how DVD buckets 510–514 ended up inside Blu-ray bucket 104.

Figure 4-12. When I changed from DVD to Blu-ray for my backup optical media, I combined existing buckets into new Blu-ray-sized ones. In this figure, you see how DVD buckets 510–514 ended up inside Blu-ray bucket 104.

I’ve now gone to Blu-ray as my write-once backup media, so I make my buckets 23 GB. For images that were archived before the switch to Blu-ray, I’ve combined DVD buckets to make a smaller number of Blu-ray buckets (Figure 4-12). You may not want to reburn those older archived files to Blu-ray right away, since the cost of Bluray disks is still pretty high (as of this writing). The price should drop relatively quickly to be comparable to the cost of a DVD as more factories come online.

You Might Be Wondering ...

How Do I Split My Files into the Right Sizes?

Once you decide to split your archive into chunks of a certain size, you need to figure out how to go about actually doing it. You can use the tools in the operating system to show you how much a folder holds. On the Mac, you can go into List view in the Finder, choose View > Show View Options, and then enable Calculate All Sizes (Figure 4-13). In Vista, you can right-click a folder and choose Properties, or mouse over the folder (Figure 4-14).

I prefer to use my DAM software to split the images into bucket sizes just prior to archiving the files. Unfortunately, I don’t know of any tools that will automatically split files into size-limited folders—you will have to do it manually. I’ve found that this is a pretty quick process in Expression Media. I show you how to do this in Chapter 11.

On the Mac OS, you can view a list of folders that includes the size of the images in the folders. Be aware that this list does not always update quickly, so it’s not the best place to do the dividing.

Figure 4-13. On the Mac OS, you can view a list of folders that includes the size of the images in the folders. Be aware that this list does not always update quickly, so it’s not the best place to do the dividing.

Figure 4-14. When you mouse over a folder in Vista, you can see the size of the folder, including all contents.

Using date buckets with optical disk backup

Organizing your bucket folders by date is a common way to archive images, since it provides an orderly sequence to the photos. You can create a set of folders like the ones shown in Figure 4-15, which provide a sequence and a hierarchy for the image folders. There are a couple of issues to consider if you want to take this approach.

The first challenge is that for most people, date-limited folders are going to be of different sizes, which will make write-once media backup hard to implement. If you generally shoot no more than, say, 3 GB per month, most months will fit on one DVD. How do you handle months, however, where you might shoot more than 3 GB? You have to move to an inconsistent bucket system where some buckets represent one month and some represent part of a month. This can get messy, particularly when you need to restore the archive after some kind of failure.

If you use Date folders for your buckets, it’s easiest if all of the time chunks can fit inside the same write-once media size. You’ll also want to make sure to get each month’s files processed before burning that month’s disk.

Figure 4-15. If you use Date folders for your buckets, it’s easiest if all of the time chunks can fit inside the same write-once media size. You’ll also want to make sure to get each month’s files processed before burning that month’s disk.

If budget allows, you could move to a write-once backup that is going to always be oversized, as Blu-ray would be for many people. With Blu-ray disks costing $5 each (at the time of this writing and falling fast), it would not be terribly extravagant to burn a disk monthly, regardless of how much data is in the folder. Of course you could forgo write-once media altogether, but after reading Chapter 5 and Chapter 6, you may see why I think that’s risky.

There’s another challenge that will come up for some people—I know it would for me. It can be inconvenient to process things in the order they were shot. I have sometimes done shoots that sit in the Working folder for quite some time until they are ready to archive. Maybe I get busy with other work or with travel. Maybe I know it’s going to be a chore to get the images looking right and I simply procrastinate. The result is that sometimes images don’t make it all the way through the workflow pipeline for months, long after later shoots have been processed and archived. By insisting on a date-based folder hierarchy, I would make the workflow and archiving process less flexible, and more subservient to the unyielding governance of chronology.

Using date buckets with unmatching disk backups

For photographers who want the structure of date-limited buckets in their archives and want the protection of write-once media, there’s another option. You could burn a backup CD, DVD, or Blu-ray disk early in the workflow process and keep it as a disaster recovery copy without worrying too much about making it match the primary storage structure (Figure 4-16). In the event that you lose the primary and backup hard drive copies of the archive, you will still have a copy on optical disk. If the files on the disk had the same unique names as the files in the archive, it might be reasonably easy to put the collection back together the way it was.

It also might be very difficult to reconstruct the archive, depending on how complete the optical disk archive is, whether there is any duplication, and how your DAM software handles finding files in different directory structures. If you’re going to take this approach, you need to be very consistent about exactly when you burn your images off to disk, since it will be difficult to cross-check the completeness later.

Note

While I don’t recommend using unmatched backups for your write-once media, it beats the heck out of having no write-once backups.

Figure 4-16. You can also make backups that don’t correspond exactly to the way your images are archived.

Folder structure with no optical disk backup

If you don’t use optical media as part of your backup plan, you have unlimited flexibility in terms of folder size. You can use whatever organizational structure you want—dates, job numbers, whatever. While I’d still generally suggest some kind of structured, sequential folder hierarchy, it can be more free-flowing. You also don’t have to worry about segregating old and new work since you can back everything up from the primary to the backup(s) as you create it.

If you are depending only on hard drives to store your backups, however, you need to realize that viruses, volume corruption, or possibly human error could cause the loss of the entire archive or a large chunk of the archive. As you consider whether the hassle of burning optical disks is worthwhile, take a moment to consider the value of the collection as a whole or what you might be forced to pay for data recovery in the event of catastrophic loss (think: second mortgage).

If, however, you are in a network environment with a backup like LTO digital tape, it’s likely that you can forgo buckets altogether without much risk of data loss. Digital tape has the advantage of saving changes to image files without losing the previous version. This adds significant protection against viruses, volume corruption, and accidental deletion. But it comes at a price—currently between $5000 and $10,000. It’s a reasonable amount for a company to spend to protect a corporate image collection, but a lot for an individual who just wants to avoid burning Blu-ray disks.

Naming the Buckets

In this section, I’ll show you how I name the folders in my archive, and the logic behind the naming. The naming scheme for the buckets needs to do a couple of things: it needs to tell you in broad terms what kind of files the bucket contains (for instance, original or derivative files), and it needs to make the buckets line up sequentially, from oldest to newest. As long as your names accomplish these two tasks, you have quite a bit of flexibility in choosing a naming scheme. Figure 4-17 shows an example of the format I have chosen for the folders containing my original files.

Figure 4-17. The anatomy of a bucket name.

Here’s a breakdown of my naming scheme:

Prefix. I put the term “RAWBLU” at the beginning to tell me that these are camera originals. In my case, originals can be either raw files, in-camera original JPEGs, or the .mov movie files that my point-and-shoot camera makes. My derivative files get the prefix “BLUDRV” (when I used DVD, I just used the prefixes “RAW” and “DVD,” but I have updated that for Blu-ray storage).
Sequence number. I use a three-digit number as a sequence number for easy identification. Note that this only works for 999 folders. When I was using DVD, I got to the mid-500s before switching to Blu-ray. I’m pretty confident that a new storage medium will be available before I get to Blu-Ray_999. If you are archiving for a collection that will exceed 20GBx1000 (20 Petabytes), I suggest adding an extra zero.
Content. When a bucket is made of images from a particular shoot, I’ll append a shoot name to the end of the bucket name. While I don’t depend on this folder naming to be a principal navigational tool, it can be helpful at times.

This was a more helpful tool when my buckets were sized for DVD. With Bluray buckets, I find that I am generating a lot of “Misc” buckets (as shown in Figure 4-18), since most of my shoots are less than 23GB.

Figure 4-18. If a bucket holds a number of different shoots, the content part of the name can just say Misc.

You Might Be Wondering ...

Why a Three-Digit Sequence Number?

You might be inclined to use a date instead of a simple sequence number to make your files line up. Indeed, if you use a database-style date (i.e., year/month/day, or YYMMDD), the files will line up in order. However, in my opinion, sequentially numbered buckets offer several advantages. Many of these are not obvious until you have a hundred buckets or more; simple sequence numbers are much more helpful at that point.

A sequentially numbered list of folders is much easier to visually scan through than one that has gaps in the numbering (as date-named folders generally would).
The sequentially numbered folders are easier to manage in data migration, as we’ll see in Chapter 12. It’s easier to check for completeness and to see gaps in transfer.
The buckets line up a lot neater when you want to correlate primary to backup.

You Might Be Wondering

Can’t I Just Name My Folders and DVDs According to the Client?

There are a couple of problems with naming top-level folders after clients and not including a numerical designation, particularly if you use CDs, DVDs, or Blu-ray discs as your backup media. The first is that few of your jobs will be the right size to fill up their own discs. This means that different jobs will probably be grouped together on a single disc, and some jobs will be have to be divided between multiple discs. In addition to being inefficient (because you’re likely to wind up with various partially used discs), this makes for a confusing filing system. If images from Job D are grouped with images from Job X, where do you file the disc—under D or X? This gets even trickier for personal work: do you file it under Family, Vacation, or Grand Canyon?

An additional problem with client naming is that new work is dropped in the middle of the alphabetical sequence, making it harder to ascertain which images have been backed up. With the numerical bucket system that I advocate, it’s very simple to determine that folder RAWBLU_156 has in fact been copied to your backup hard drive as RAWBLU_156 and burned to backup optical disk RAWBLU_156.

Restoration of your archive, should you ever have to do it, will also be much easier if you are simply loading sequential folders from a hard drive or from a set of optical disks. Keep in mind that you will be needing these backups because you just experienced a fire/theft/virus/hard drive failure. This will be a stressful time, and simplicity will be your friend.

The most important reason to implement my system, however, is that it’s easier in terms of day-to-day workflow. You simply look at the most recent folder, drop in images if there’s room, and make a new one when it’s full. No important decision making as to whether the folder should be named for the agency, the client, or the subject matter is required.

Warning

Make sure you don’t use any nonstandard punctuation in your folder names. You might be tempted, for instance, to include a / in a date name. This may cause hard-to-diagnose problems to appear sometime later. To be safe, only use numbers, letters, dashes, or underscores in your folder names.

Naming buckets for derivative images

Since I use different directory structures for original and derivative images, I have a slightly different naming convention for each. I call the derivative image buckets BLUDRV_001 and so on. BLU indicates it’s a Blu-ray disk, and DRV indicates it’s a derivative. You can use any other combination of letters that makes sense to you—as long as you’re consistent it doesn’t make a huge difference, although I do recommend that you keep it short. Figure 4-19 shows how my buckets for derivatives look.

You Might Be Wondering ...

Spaces or Underscores in File Names

You’ll see in some later chapters that I use underscores in folder names in the PC figures. There are times that spaces in folder names can cause Windows programs to malfunction. The safest way to name folders in a Windows environment is to use underscores instead of spaces. This is not needed on the Mac OS, due to file-handling differences at a system level.

Regardless of which platform you are on, file names, particularly image file names, should never have spaces.

My derivative files drive is structured in much the same way as my original files drive, except that I don’t use content as part of the bucket name—you may want to include it, but I find many derivative buckets would be labeled Misc, and that would not serve much purpose.

Figure 4-19. My derivative files drive is structured in much the same way as my original files drive, except that I don’t use content as part of the bucket name—you may want to include it, but I find many derivative buckets would be labeled Misc, and that would not serve much purpose.

Naming subfolders

The subfolders within my DVD-sized buckets are each titled with a short reference to the subject matter they contain, such as client, agency, subject matter, or some combination thereof. This is the only directory-based content management that I use. Keeping this information nested inside a numerically sequential directory structure means that it’s still available, but it doesn’t confuse the directory structure.

As you download images from your memory cards and prepare them for their permanent homes in the directory structure, you can put keywords into the subfolder names. As in all of your archive workflow, you will do yourself a favor to use a controlled—or at least consistent—vocabulary. For instance, you might call personal images of your family Kids, Family, FAM, home, or something else along those lines. Decide on a term you like and stick with it.

Figure 4-20 shows how I use subfolder names to indicate basic information about the files inside. This directory-based information is particularly useful as I make my first groupings with my cataloging software. By searching the directory for all image files with “Family” somewhere in the pathname, I can quickly make a virtual set that includes all these files.

You may also wish to append a date to these subfolders so that you can distinguish subfolders with the same content info—for example, Family_081225 and Family_080704—and not risk any folder collisions.

Figure 4-20. You can name subfolders to include basic information about the contents.

Combining Buckets

As technology marches along, larger storage solutions will come to market. Those of us who started out using 44 MB SyQuest cartridges have seen the advent of numerous storage media improvements: 100 MB zip disks, 680 MB CDs, 4.4 GB DVDs, and now, Blu-ray discs with capacities of 25 or 50 GB. The beauty of the bucket system is that it’s infinitely scalable—you can simply combine smaller folders together to make bigger ones so that the information structure stays stable (Figure 4-21).

You Might Be Wondering ...

Am I Going to Need Buckets Forever?

Making buckets and burning write-once media like CD, DVD and Blu-ray can seem like a pain in the neck to do, and some people are tempted to do without. Wouldn’t it be nice to skip these steps? Yes it would, but it’s a risk, at least for now. Hard drive storage is vulnerable to a lot of hazards that don’t affect write-once media, like virus, volume corruption, accidental erasure, and other inadvertent damage. I consider it my job to get you from this imperfect present to a more perfect future with your photos intact (the prime directive!).

As we get better data validation tools, the risk of hard-drive-only storage will diminish. We’ll be able to easily check the integrity of both the primary archive and the backup copies with absolute certainty. This will help us recover from storage problems before it’s too late. More in Chapter 6.

Figure 4-21. Because you can put smaller buckets into larger buckets, your collection will scale well.

There are two different ways to combine smaller buckets into larger ones: either you can place existing folders intact into a new, larger folder or you can dump the contents of the folders together into the new folder. The critical issue is that your cataloging software needs to keep track of the changes. As long as it can still “see” the image files in the new directory structure, you will have an easy transition.

As you move to a larger bucket size, you will want to combine the smaller buckets into larger ones. This keeps the information structure stable and provides for reasonably simple data migration.

Figure 4-22. As you move to a larger bucket size, you will want to combine the smaller buckets into larger ones. This keeps the information structure stable and provides for reasonably simple data migration.

I think it makes the most sense to place the folders intact into the larger buckets, as shown in Figure 4-22. This is what it looked look like when I changed to Blu-ray for backup.

The Fine Art of File Naming

File naming is the most basic component of your file system structure. As such, it’s important that you choose a method of naming files that you can grow with and use for a long time.

Most imaging software has batch-renaming utilities that allow you to select among various types of text and data to use in the renaming process. In this section we will take a look at the desired end result. We’ll examine how to actually achieve this result in later chapters.

You Might Be Wondering ...

Do I Have to Combine Buckets When I Go to Larger Optical Media?

If you switch to a larger optical media for buckets, you will be faced with the prospect of updating all past bucket sizes to the new size. This process consists of several parts. One is gathering past work into the new bucket size, and the other is burning the older buckets to the new media type.

I suggest that you would probably want to gather the old buckets into the new bucket size for the entire collection. The main reason for this is that you’re probably upgrading media size because you have a lot of existing buckets (if you only have 10 DVD buckets, moving to Blu-ray doesn’t make a lot of sense). By combining buckets right away, you reduce visual clutter in the archive and you get a more accurate count of where your new bucket numbering should start.

That does not mean that you need to immediately burn the new buckets to the new media type. When I migrated to Blu-ray, I combined the DVD buckets into Blu-ray-size ones, but have not yet burned the old work to Blu-ray disks. Since the images are still in DVD buckets inside the Blu-ray buckets, my existing DVD backup is still as valid as it ever was. I’ll burn to Blu-ray once the disk price falls a bit more.

Naming Original Files

Digital cameras generate huge numbers of files with similar or identical names. As of this writing, no cameras that I am aware of will give you sufficient options to automatically generate unique file names. And even as they do begin to offer this option, you will still have to decide on a naming system that you like.

Here are the most important characteristics of a naming system:

Each image file should have a unique file name. This helps in all sorts of ways, from preventing accidental overwriting, to assisting in any client communications about the files, to archive reconstruction. Using the date taken as part of the file name is probably the easiest method of making sure that you do not give a file a name that has already been given to another image.
Each file name should end in a three-letter extension preceded by a period. For Windows users, this will be second nature. For Mac users, explicitly including file extensions (e.g., .NEF, .CRW, or .JPG) is still optional; however, using the extension ensures greater compatibility.
File names should be no more than 31 characters long, and the only punctuation should be underscores, dashes, and a single period just prior to the file extension. Although most computer systems can support names longer than 31 characters, we want to structure our practices to be universal. Keep it to 31 characters (plus the extension), and you will be safe. Again, Mac users may have to get used to not including spaces and other punctuation in file names, but this will ensure greater cross-compatibility. The compatibility problems that this causes are disappearing, but aren’t gone. Most of the issues show up in web-based environments.
The naming system should be regularized and universally applied. If you implement a standardized naming convention, you’ll save time and avoid errors.
Including a database-style date component in the name can help you line up all your files in the order they were taken. Database-style dates start with the year, then the month, then the day. This keeps files in chronological order. The notation for the date format usually comes in the following form: YY or YYYY indicates the year as either a two- or four-digit number, and MM and DD indicate the month and date, respectively. If you use a two-digit number for year, your naming system will be obsolete in the year 2100. While I’d like to think that I will be shooting pictures when I’m 140 years old, the odds are against it. I have therefore chosen a format that will only really work for images shot up until 2100. Also, by dropping the first two numerals off the year, I gain two more characters to use when renaming files as derivative files (as outlined below). If you want to build a system that works forever, use a four-digit year.
Obviously, your naming system must work with whichever program you are using to generate the names. It should be easy to apply names without having to determine if they have been used previously. Again, using a date in the file name can streamline this process.
You can append tags or codes to the file names to indicate derivative versions of image files. Successive versions of the same file—a master file, for instance, or one converted to black and white—should have some kind of tag added to the file name to note the enhancements. This is generally preferable to renaming the entire file (there are times when you might want to rename a file entirely, though, as discussed in Chapter 8–Chapter 11).
The file name does not have to carry important content information about the file. Putting content information in the file name can get complicated, and it takes up valuable character space. It can be a very time-consuming process compared to simply appending a few keywords. I’m not saying the file name can’t carry content information, just that it doesn’t have to.
A file naming system that incorporates your name can help your clients keep track of your images. As discussed below, adding a string (a sequence of letters) in the file name that identifies a file as yours will help your clients remember where it came from.

Okay, so that’s a lot of considerations. To see them in action, let’s take a look at my favorite naming system and examine why it works. The convention I have settled on for naming my digital camera files looks like this: Krogh_050428_2728.dng (Figure 4-23).

Figure 4-23. My naming convention looks like this—my name, the date taken, and a unique four-digit identifier.

Here are the elements:

The originator string. I start the name with the string “Krogh” because I want people to know the image came from me. This works for me because I have a short and reasonably uncommon name, but if your name is something like Smith or Jones, including it in the file name might be less valuable (I do know one photographer who has a common last name but an uncommon first name; he uses his first name in his file names). If your name is long— Schwarzenegger, for instance—you might also find that it’s unworkable to include it in the file name. You might decide to use your initials instead or to use no originator component at all.
When deciding on a string to use, remember that you may be appending other strings to these file names as you work with the files and that you’ll need to keep the entire name below the 31-character limit.
I put the originator string at the front of the file name, but you could put it elsewhere if you like—after the date, for instance.
The date string. After my name, I place an underscore in the file name to set it apart from the next element, which is the date. The underscore is just a little visual marker that makes the file name easier to read. I then place a database-style date in the file name. My preferred naming format, as indicated earlier, is often noted as the date (YYMMDD). Thus, 050201 is February 1, 2005. It’s a bit hard for us Americans to read this date format quickly at first, but for sorting purposes it makes much more sense than the way we usually write dates (month/day/year, or MMDDYY), as shown in Figure 4-24.
The unique identifier sequence number. The next part of the file name— again, separated by an underscore—is the unique identifier number. In many cases, this is the number that was incorporated in the original file name (e.g., in DSC_1234.NEF, 1234 is the unique identifier). As a matter of fact, when I am shooting with only one camera, I almost always use the unique identifier number from the original file. If I am shooting with two cameras and want to keep from having overlapping file names, I will generate an automatic sequence number in the ingestion process, rather than picking up the camera-generated one.

As you can see, if you use a database-style date near the front of the file name, your files will line up in chronological order. This is even more important if you use the DNG format because the filesystem will see date created as the date the DNG was made, not the date the picture was taken.

Figure 4-24. As you can see, if you use a database-style date near the front of the file name, your files will line up in chronological order. This is even more important if you use the DNG format because the filesystem will see date created as the date the DNG was made, not the date the picture was taken.

Special naming considerations for JPEGs

One of the benefits of shooting a raw file is that you know that no application will write over that file (no program can, for instance, reduce the size of the file and then resave it as a raw file, forever changing the original). The only applications that can resave raw files work only on the metadata of the file, not the underlying image data.

If you shoot in-camera JPEG originals, however, your original image files are at risk—it’s possible to accidentally reduce the size of the file or make some other destructive change to it and then save over the original. It’s important, therefore, to have some way to prevent saving over your originals, as well as to have good backups of the files in case you ever need to restore the originals.

I use a combination of careful handling in the workflow pipeline and archive structure to keep from overwriting in-camera JPEGs. Let’s examine how you can use file names to address this problem.

Tip

You may be tempted to rename each batch of photographs with unique identifier sequences starting at 0001, but this will yield a very large number of files in your collection with unique identifiers of, say, 0011. Using the original, camera-assigned unique identifier gives you a range from 0001 to 9999. This makes any directory-based searches for image files much faster and easier to do.

You might choose to append the terms Orig or CO (for Camera Original) to the names of your original JPEG files (for example, Krogh_050101_1234_Orig.jpg), so that you can tell by looking at a file if it is the original. Of course, that means that you will need a way to remove this designation from the file name when you batch out the images. Bridge does not have an easy way to remove elements from multiple file names at once, but Expression Media and some other DAM applications do.

As an alternative to removing the distinguishing extension when batching out the images, you could instead choose to append a designation to the file name to distinguish it from a camera original. You could, for instance, add “PY” to the end of a file name ending in “CO” to get a file name that ends in “COPY.” Most DAM software makes this a fairly easy process.

You Might Be Wondering ...

Shouldn’t I Put the Client’s Name or the Subject Matter into the Name of the File?

I originally thought this was a good idea, but I soon found that it didn’t work for me. Right off the bat, from a workflow standpoint, it was difficult. Should I use the client, agency, project, or location in the name? What if I’ve used the agency name, and I now want to sell this as stock to another agency? It also started to be a real hassle to remember which codes were used for which subject matter, particularly for personal work.

More importantly, I was interested in having the client associate me with the photographs. I figured it was more important for them to know that the image was from Peter Krogh than to know that it was prepared for Agency X.

Most of all, putting my name in the file name was just simpler. This meant that a large memory card full of personal images shot in many locations could simply be batch-renamed, without first evaluating the subject matter.

Some people feel uncomfortable leaving content out of the file name. If you are going to add content to a file name, keep the designation short; I also suggest you don’t make it the first part of the name. In Chapter 7, we’ll look at one way that you can append a client or project name to files on download.

As you decide on your file naming system, you need to evaluate how confident you are in handling your files so that you can determine whether your JPEG camera originals will need special designations in their names. I have settled on a system that does not name my JPEG files in any special way. They are named just like raw files—Krogh_080101_1234.JPG.

Naming Derivative Files

The method detailed above is the basic naming structure I use for original files. I suggest that the core of the file name (in this example, Krogh_050101_1234) should never change for internal uses. By keeping this part of the name constant, you will streamline the file-handling process. A single search on this string, for instance, should be able to identify every version you have ever made and saved of this file.

When you make a new version of the image, I suggest appending a string to the name. For instance, when I prepare a master file—a color-corrected, retouched master version of the image—I append the word “Master” to the file name (e.g., Krogh_050101_1234_Master.tif). If I make a black and white conversion, I add the letters “BW” to the file name.

Table 4-1 shows some examples of the strings I use to note different types of derivative files. As we discussed in the previous chapter on metadata, you should be as consistent as possible so that you always know what each designation represents. You should think about your workflow and the kind of photography you deliver as you decide on the naming convention you will use for derivative files.

Table 4-1. My File Naming Convention

File name	Type of file
Krogh_050101_1234.DNG	Camera original file.
Krogh_050101_1234.NEF	Camera original file.
Krogh_050101_1234_Master.tif	Color-corrected, retouched file.
Krogh_050101_1234_BW.tif	Black and white version of the color master file.
Krogh_050101_1234_MasterBW.tif	Master black and white file when a color master also exists.
Krogh_050101_1234_CMYK.tif	Image that has been converted to CMYK for offset printing.
Krogh_050101_1234_Flat.tif	Flattened version of the master file.
Krogh_050101_1234.tif	File intended for client delivery or prepared for printing or proofing. To reduce confusion, I typically remove any suffixes I’ve appended to the file name (except “CMYK”) before I deliver the file to the client.
Krogh_050101_1234.jpg	In-camera original (for in-camera JPEG originals), or a file to be delivered to a client.

When to Change File Names

You might be tempted to rename image files with more descriptive terms at some point (say, changing Krogh_050101_1234 to NewYearsDay.tif). With a few exceptions, I recommend never changing the name of a file after you have given it a unique name and worked on it. Keeping file naming constant will reduce confusion between you and your clients, and also between your present self and your future self.

One exception is if you know that your client will be renaming the files—for instance, naming product photos for the items pictured may be part of the specifications of the job. In this case, it will be helpful for you to rename the file yourself and to put your name into the metadata of the file so that you can reference the original if needed. This ensures that the real name is placed in the metadata, performs a service for the client, and (since I suggest keeping copies of all delivered files, even if they are just flattened copies of master files) provides a way to link a renamed file to its original raw or master file.

The other time I rename is when the destination of the image file is my website and I want the image to be discoverable by image search engines. A descriptive name, rather than a date/sequence name, makes the file easier for Google and other search engines to locate. Google Image Search uses the name of the image file as an important determinant for search results, so descriptive terms in file names can be of real value here.

Aside from these situations, it’s good DAM practice to keep file names permanent because the file name is the foundation of the information structure.

KEYWORDS: National Archives and Records Administration, U.S. Declaration of Independence, U.S. Constitution, U.S. Government, National Treasure

Get The DAM Book, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

The DAM Book, 2nd Edition by Peter Krogh

Chapter 4. Organizing and Naming Files and Folders

Organizing Your Image Files: An Overview

Folders: What Are They Good For?

Consistent Directory Structure Is Your Friend

Designing a Unified Directory Structure

Build Your System for the Data Lifecycle

Separate Originals and Derivative Files

Working File Directory Structure

An Image Workflow Pipeline

Other Working Files

Archive Directory Structure: An Overview

Segregating Original and Derivative Files

Note

Enter the Bucket Brigade

Archive Directory Structure: Making Buckets

Determining the Bucket Size

Sizing your buckets according to your backup medium

Using date buckets with optical disk backup

Using date buckets with unmatching disk backups

Note

Folder structure with no optical disk backup

Naming the Buckets

Warning

Naming buckets for derivative images

Naming subfolders

Combining Buckets

The Fine Art of File Naming

Naming Original Files

Special naming considerations for JPEGs

Tip

Naming Derivative Files

When to Change File Names

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly