O'Reilly logo

The Art of SQL by Peter Robson, Stephane Faroult

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1. Laying Plans

Designing Databases for Performance

 

C'est le premier pas qui, dans toutes les guerres, décèle le génie.

It is the first step that reveals genius in all wars.

 
 --Joseph de Maistre (1754-1821) Lettre du 27 Juillet 1812 à Monsieur le Comte de Front

The great nineteenth century German strategist, Clausewitz, famously remarked that war is the continuation of politics by other means. Likewise, any computer program is, in one way or another, the continuation of the general activity within an organization, allowing it to do more, faster, better, or cheaper. The main purpose of a computer program is not simply to extract data from a database and then to process it, but to extract and process data for some particular goal. The means are not the end.

A reminder that the goal of a given computer program is first of all to meet some business requirement [*] may come across as a platitude. In practice, the excitement of technological challenges often slowly causes attention to drift from the end to the means, from upholding the quality of the data that records business activity to writing programs that perform as intended and in an acceptable amount of time. Like a general in command of his army at the beginning of a campaign, we must know clearly what our objectives are—and we must stick to them, even if unexpected difficulties or opportunities make us alter the original plan. Whenever the SQL language is involved, we are fighting to keep a faithful and consistent record of business activity over time. Both faithfulness and consistency are primarily associated with the quality of the database model. The database model that SQL was initially designed to support is the relational model . One cannot overemphasize the importance of having a good model and a proper database design, because this is the very foundation of any information system.

The Relational View of Data

A database is nothing but a model of a small part of a real-life situation. As any representation, a database is always an imperfect model, and a very narrow depiction of a rich and complex reality. There is rarely a single way to represent some business activity, but rather several variants that in a technical sense will be semantically correct. However, for a given set of processes to apply, there is usually one representation that best meets the business requirement.

The relational model is thus named, not because you can relate tables to one another (a popular misconception), but as a reference to the relationships between the columns in a table. These are the relationships that give the model its name; in other words, relational means that if several values belong to the same row in a table, they are related. The way columns are related to each other defines a relation, and a relation is a table (more exactly, a table represents one relation).

The business requirements determine the scope of the real-world situation that is to be modeled. Once you have defined the scope, you can proceed to identify the data that you need to properly record business activity. If we say that you are a used car dealer and want to model the cars you have for sale (for instance to advertise them on a web site), items such as make, model, version, style (sedan, coupe, convertible...), year, mileage, and price may be the very first pieces of information that come to mind. But potential buyers may want to learn about many more characteristics to be able to make an informed choice before settling for one particular car. For instance:

  • General state of the vehicle (even if we don't expect anything but "excellent")

  • Safety equipment

  • Manual or automatic transmission

  • Color (body and interiors), metallic paintwork or not, upholstery, hard or soft top, perhaps a picture of the car

  • Seating capacity, trunk capacity, number of doors

  • Power steering, air conditioning, audio equipment

  • Engine capacity, cylinders, horsepower and top speed, brakes (everyone isn't a car enthusiast who would know technical specifications from the car description)

  • Fuel, consumption, tank capacity

  • Current location of the car (may matter to buyers if the site lists cars available from a number of physical places)

  • And so on.. .

If we decide to model the available cars into a database, then each row in a table summarizes a particular statement of fact—for instance, that there is for sale a 1964 pink Cadillac Coupe DeVille that has already been driven twenty times around the Earth.

Through relational operations, such as joins, and also by filtering, selection of particular attributes, or computations applied to attributes (say computing from consumption and tank capacity how many miles we can drive without refueling), we can derive new factual statements. If the original statements are true, the derived statements will be true.

Whenever we are dealing with knowledge, we start with facts that we accept as truths that need no proof (in mathematics these are known as axioms , but this argument is by no means restricted to mathematics and you could call those unproved true facts principles in other disciplines). It is possible to build upon these true facts (proving theorems in mathematics) to derive new truths. These truths themselves may form the foundations from which further new truths emerge.

Relational databases work in exactly the same way. It is absolutely no accident that the relational model is mathematically based. The relations we define (which once again means, for an SQL database, the tables we create) represent facts that we accept, a priori, as true. The views we define, and the queries we write, are new truths that we prove.

Note

The coherence of the relational model is a critically important concept to grasp. Because of the inherent mathematical stability of the principles that underlie relational data modeling , we can be totally confident that the result of any query of our original database will indeed generate equally valid facts—if we respect the relational principles. Some of the key principles of the relational theory are that a relation, by definition, contains no duplicate, and that row ordering isn't significant. As you shall see in Chapter 4, SQL allows developers to take a number of liberties with the relational theory, liberties that may be the reasons for either surprising results or the failure of a database optimizer to perform efficiently.

There is, however, considerable freedom in the choice of our basic truths. Sometimes the exercise of this freedom can be done very badly. For example, wouldn't it be a little tedious if every time someone went to buy some apples, the grocer felt compelled to prove all Newtonian physics before weighing them? What must be thought of a program where the most basic operation requires a 25-way join?

We may use much data in common with our suppliers and customers. However, it is likely that, if we are not direct competitors, our view of the same data will be different, reflecting our particular perspective on our real-life situation. For example, our business requirements will differ from those of our suppliers and customers, even though we are all using the same data. One size doesn't fit all. A good design is a design that doesn't require crazy queries.

Important

Modeling is the projection of business requirements.



[*] The expression business requirement is meant to encompass non-commercial as well as commercial activities.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required