View Updating and Relational Theory

Foreword

In the field of relational database theory and practice there have been two particularly thorny and controversial issues, neither of which has been resolved to everybody’s satisfaction: the missing information problem and the view updating problem. On the first of these, Chris Date has written copiously over the last 30 years or so; now he tackles the second one head on.

It’s not as though he hasn’t addressed the subject before, of course. His well known and widely used textbook, An Introduction to Database Systems, included material—well, a page or two, at any rate—on the subject in its very first edition, published in 1975. That page count grew to sixteen or so in the eighth edition (2004). His first whole chapter on the subject appeared in the book that started his long running Relational Database Writings series, in 1986. In the fourth book in that series, which appeared in 1995, he and David McGoveran gave us two chapters that showed evidence of a major shift in thinking on the issue, based on McGoveran’s work. That thinking then further evolved in an appendix in Databases, Types, and the Relational Model: The Third Manifesto (2007), through a chapter in Database Explorations (2010), and on to the present volume.

The basic idea, first mooted by E. F. Codd in 1969, has never changed. Assume we’re given a database consisting, by definition, of (a) some collection of relation variables or relvars,^[3] together with (b) a set of integrity constraints governing the permissible values of those relvars. Those given relvars are said to be the base ones. In general, the chosen design is one of several that could have been chosen to represent exactly the same information. From the chosen design we can derive an alternative one by defining virtual relvars, or views, in terms of relational expressions referencing the base relvars. For various reasons, such an alternative design—an alternative view of the database, in effect—might be considered more suitable than the base design for certain users. More importantly, that alternative design might actually exclude parts of the underlying or “real” database that some users have no interest in, or perhaps are not authorized to see. Moreover, if some change to the base design becomes necessary, virtual relvars representing the original design can be defined on the new design, such that existing users’ views of the database are immune to the change and potentially unpleasant upheavals are avoided. This is the basic idea behind the well known goal of logical data independence.

The thorny issues arise when users express database updates in terms of updates against the virtual relvars they see as constituting their database. How is the DBMS to determine the real updates to the real database that will cause the specified changes to occur in those virtual relvars? And if there are several ways of achieving the desired effect, which one should be chosen? For a simple example, suppose a user of the usual suppliers-and-parts database (described in detail in Chapter 1) sees a virtual relvar, or view, PS that shows only those suppliers that are located in Paris. The defining expression for view PS is, of course, S WHERE CITY = ‘Paris’. Now suppose that same user tells the DBMS to delete the tuple for supplier S2 from that view PS. Should the DBMS assume that supplier S2 no longer exists and delete the underlying tuple from base relvar S? Or should it reject the request as being ambiguous, considering that the same effect could be achieved by replacing supplier S2’s CITY value by something other than Paris? Moreover, suppose the user actually knows supplier S2 has moved to London and attempts to effect that change by “updating the tuple” for supplier S2 accordingly in view PS. Should the DBMS accept that update? Now suppose still further that view PS excludes the STATUS attribute. How should the DBMS react to an attempt by that user to insert tuples into that view, given that such tuples must necessarily omit values for STATUS?

These and many more are the kinds of questions Date attempts to answer in the detailed, thorough, careful, methodical analysis he now offers us. He lays out his plan of attack in the first three chapters. He clearly defines what it means for two database designs to be equivalent in the sense of representing the same information, and he then describes the methodology applied in the next ten chapters. That methodology entails examining each of the operators of the relational algebra in turn. For example, that “Paris suppliers only” view PS is what he calls a restriction view—i.e., a virtual relvar defined using just the restriction operator. Likewise, the view that excludes the STATUS attribute from PS is defined using projection. As this latter view is a projection of a restriction, we can infer the effects of updates on it by invoking Date’s rules for updating through projection to determine the effects on the underlying restriction, then invoke the rules for updating though restriction to determine the effects on the underlying base relvar S.

Applying the rules for a view whose definition involves several relational operations raises a very interesting and possibly controversial issue that Date addresses in Chapter 14: viz., if two expressions are syntactically distinct but logically equivalent (in the way that, for example, the numerical expressions x(y+z) and xy+xz are syntactically distinct but logically equivalent), should views defined on those expressions necessarily exhibit identical behavior with respect to update operations on them?

Now, some aspects of Date’s proposals proved to be controversial when they appeared in the 2007 and 2010 publications I mentioned earlier. For example, should a tuple inserted into a view defined on the union of R1 and R2 result in that tuple appearing in both R1 and R2? And should a tuple being deleted from a view defined on the intersection of R1 and R2 result in that tuple disappearing from both R1 and R2? I am on record as being one of those who expressed opposition to those particular proposals—this being, I hasten to add, the only serious technical disagreement between Date and myself that has arisen during our long period of collaboration. Those controversial details are retained here and Date has strengthened his rationale for them, though admitting that he might still fail to convince everybody who was against them. For my part, I found that his final chapter, Chapter 15, offers an intriguing possibility of light at the end of this particular tunnel. In it he describes in outline an idea, due to David McGoveran, for a radically different approach to the language we use for updating relational databases, effectively replacing—or at least extending—the familiar INSERT, DELETE, and UPDATE operators that have been with us in some form or other since prerelational times. Among the advantages claimed for this novel approach is that the problems giving rise to the controversy I have mentioned simply do not arise.

Date tells us that he does not expect or even wish this book to be the end of the story on view updating, but he hopes it will provide a firm basis on which the debate can move forward. I think that is exactly what he has provided, and I join him in that hope.

Hugh Darwen

Shrewley, England

2013

^[3]SQL would call those relvars tables. For further explanation of the terminology of relvars and related matters, see Chapter 2.

Get View Updating and Relational Theory now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

View Updating and Relational Theory by C.J. Date

Foreword

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly