O'Reilly logo

Database Design and Relational Theory by C.J. Date

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

THE PLACE OF DESIGN THEORY

To repeat something I said in the preface, by the term design I mean logical design, not physical design. Logical design is concerned with what the database looks like to the user (which means, loosely, what relvars exist and what constraints apply to those relvars); physical design, by contrast, is concerned with how a given logical design maps to physical storage.[8] And the term design theory refers specifically to logical design, not physical design—the point being that physical design is necessarily dependent on aspects (performance aspects in particular) of the target DBMS, whereas logical design is, or should be, DBMS independent. Throughout this book, then, the unqualified term design should be understood to mean logical design specifically, barring explicit statements to the contrary.

Now, design theory as such isn’t part of the relational model; rather, it’s a separate theory that builds on top of that model. (It’s appropriate to think of it as part of relational theory in general, but it’s not, to repeat, part of the relational model per se.) Thus, design concepts such as further normalization are themselves based on more fundamental notions—e.g., the projection and join operators of the relational algebra—that are part of the relational model. (All of that being said, it could certainly be argued that design theory is a logical consequence of the relational model, at least in part. In other words, it would be inconsistent to agree with the relational model in general but not to agree with the design theory that’s based on it.)

The overall objective of logical design is to achieve a design that’s (a) hardware independent, for obvious reasons; (b) operating system and DBMS independent, again for obvious reasons; and finally, and perhaps a little controversially, (c) application independent (in other words, we’re concerned primarily with what the data is, rather than with how it’s going to be used). Application independence in this sense is desirable for the very good reason that it’s normally—perhaps always—the case that not all uses to which the data will be put are known at design time; thus, we want a design that’ll be robust, in the sense that it won’t be invalidated by the advent of application requirements that weren’t foreseen at the time of the original design. Observe that one important consequence of this state of affairs is that we aren’t (or at least shouldn’t be) interested in making design compromises for physical performance reasons. Design theory should never be driven by performance considerations.

Back to design theory as such. As we’ll see, that theory includes a number of formal theorems, theorems that provide practical guidelines for designers to follow. So if you’re a designer, you need to be familiar with those theorems. Let me quickly add that I don’t mean you have to know how to prove those theorems (though in fact the proofs are often quite simple); what I mean is, you have to know what the theorems say—i.e., you have to know the results—and you have to be prepared to apply those results. That’s the nice thing about theorems: Once somebody’s proved them, their results become available for anybody to use whenever they need to.

Now, it’s sometimes claimed, not entirely unreasonably, that all design theory really does is bolster up your intuition. What do I mean by this remark? Well, consider the suppliers-and-parts database. The obvious design for that database is the one illustrated in Figure 1-1; I mean, it’s “obvious” that three relvars are necessary, that attribute STATUS belongs in relvar S, that attribute COLOR belongs in relvar P, that attribute QTY belongs in relvar SP, and so on. But why exactly are these things obvious? Well, suppose we try a different design; suppose we move the STATUS attribute out of relvar S, for example, and into relvar SP (intuitively the wrong place for it, since status is a property of suppliers, not shipments). Figure 1-2 below shows a sample value for this revised shipments relvar, which I’ll call STP to avoid confusion:[9]

Relvar STP—sample value

Figure 1-2. Relvar STP—sample value

A glance at the figure is sufficient to show what’s wrong with this design: It’s redundant, in the sense that every tuple for supplier S1 tells us S1 has status 20, every tuple for supplier S2 tells us S2 has status 30, and so on.[10] And design theory tells us that not designing the database in the obvious way will lead to such redundancy, and tells us also (albeit implicitly) what the consequences of such redundancy will be. In other words, design theory is largely about reducing redundancy, as we’ll see. (As an aside, I remark that—partly for such reasons—the theory has been described, perhaps a little unkindly, as a good source of bad examples.)

Now, if design theory really does just bolster up your intuition, then it might be (and indeed has been) criticized on the grounds that it’s really all just common sense anyway. By way of example, consider relvar STP again. As I’ve said, that relvar is obviously badly designed; the redundancies are obvious, the consequences are obvious too, and any competent human designer would “naturally” avoid such a design, even if that designer had no explicit knowledge of design theory at all. But what does “naturally” mean here? What principles are being applied by that human designer in opting for a more “natural” (and better) design?

The answer is: They’re exactly the principles that design theory talks about (the principles of normalization, for example). In other words, competent designers already have those principles in their brain, as it were, even if they’ve never studied them formally and can’t put a name to them or articulate them precisely. So yes, the principles are common sense—but they’re formalized common sense. (Common sense might be common, but it’s not always easy to say exactly what it is!) What design theory does is state in a precise way what certain aspects of common sense consist of. In my opinion, that’s the real achievement—or one of the real achievements, anyway—of the theory: It formalizes certain commonsense principles, thereby opening the door to the possibility of mechanizing those principles (that is, incorporating them into computerized design tools). Critics of the theory often miss this point; they claim, quite rightly, that the ideas are mostly just common sense, but they don’t seem to realize it’s a significant achievement to state what common sense means in a precise and formal way.

As a kind of postscript to the foregoing, I note that common sense might not always be that common anyway. The following lightly edited extract from a paper by Robert R. Brown of Hughes Aircraft[11] illustrates the point. The author begins by giving “a simplified real example”—his words—involving an employee file (with fields for employee number, employee name, phone number, department number, and manager name) and a department file (with fields for department number, department name, manager name, and manager’s phone number), all with the intuitively obvious meanings. Then he continues:

The actual database on which this example is based had many more files and fields and much more redundancy. When the designer was asked his reasons for such a design, he cited performance and the difficulty of doing joins. Even though the redundancy should be clear to you in my example, it was not that evident in the design documentation. In large databases with many more files and fields, it is impossible to find the duplications without doing extensive information analysis and without having extended discussions with the experts in the user organizations.

Incidentally, there’s another quote I like a lot—in fact, I used it as an epigraph in SQL and Relational Theory—that supports my contention that practitioners really do need to know the theoretical foundations of their field. It’s from Leonardo da Vinci (and is thus some 500 years old!), and it goes like this (I’ve added the boldface):

Those who are enamored of practice without theory are like a pilot who goes into a ship without rudder or compass and never has any certainty where he is going. Practice should always be based upon a sound knowledge of theory.



[8] Be warned, however, that other writers (a) use the terms logical design and physical design to mean something else and (b) use other terms to mean what I mean by them. Caveat lector.

[9] For obvious reasons I use T, not S, as an abbreviation for STATUS, here and throughout this book.

[10] You might notice another problem, too: The design can’t properly represent suppliers like supplier S5 who currently supply no parts at all. Such “update anomalies” are discussed in Chapter 3.

[11] Robert R. Brown: “Database Systems in Engineering: Key Problems, and Potential Solutions,” in the proceedings of a database symposium held in Sydney, Australia (November 15th-17th, 1984).

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required