O'Reilly logo

Programming Visual Basic 2008 by Tim Patrick

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Designing the Database

Data. Databases. It just kind of makes sense. If you have data, you need to put it somewhere. And what better place to put it than in a "data" base?

Just to make sure I had all the "bases" covered, I did a quick search on the Internet for a useful definition. What a shock. According to virtually every web site I found, a database is "a collection of data organized for easy retrieval by a computer." With a definition like that, pretty much everything I put on my system is stored in a database. All my disk files are organized for easy access. My saved emails can be sorted by subject or date received or sender, so they must be in a database, too. Even this document can be searched and sorted in any manner I wish. Is it a database?

Relational Databases

Perhaps that definition is too broad. These days, when we think of "database," it's generally a relational database system. Such databases are built on the "relational model" designed by Edgar Codd of IBM. In 1970, he issued "A Relational Model of Data for Large Shared Data Banks," the seminal paper on relational modeling, and later expanded on the basic concepts with C. J. Date, another "real programmer." Upon reading that 1970 paper—and if you have a free afternoon, you would really benefit from spending time with your family or friends rather than reading that paper—you will enter a world of n-tuples, domains, and expressible sets. Fortunately, you don't need to know anything about these terms to use relational database systems.

The relational databases that most programmers use collect data in tables, each of which stores a specific set of unordered records. For convenience, tables are presented as a grid of data values, with each row representing a single record and each column representing a consistent field that appears in each record. Table 4-1 presents a table of orders, with a separate record for each line item of the order.

Table 4-1. Boy, a lot of people drink coffee and tea

Record ID

Order ID

Customer ID

Customer Name

Product ID

Product

Price

Quantity

92231

10001

AA1

Al Albertson

BEV01COF

Coffee

3.99

3

92232

10001

AA1

Al Albertson

BRD05RYE

Rye bread

2.68

1

92233

10002

BW3

Bill Williams

BEV01COF

Coffee

3.99

1

92234

10003

BW3

Will Williams

BEV01COF

Tea

3.99

2

92235

10004

CC1

Chuck Charles

CHP34PTO

Potato chips

0.99

7

Putting all of your information in a table is really convenient. The important data appears at a glance in a nice and orderly arrangement, and it's easy to sort the results based on a particular column. Unfortunately, this table of orders has a lot of repetition. Customer names and product names repeat multiple times. Also, although the product ID "BEV01COF" indicates coffee, one of the lines lists it as "Tea." A few other problems are inherent in data that's placed in a single flat file database table.

Mr. Codd, the brilliant computer scientist that he was, saw these problems, too. But instead of just sitting around and complaining about them like I do, he came up with a solution: normalization. By breaking the data into separate tables with data subsets, assigning a unique identifier to each record/row in every table (a primary key), and making a few other adjustments, the data could be "normalized" for both processing efficiency and data integrity. For the sample orders in Table 4-1, the data could be normalized into three separate tables: one for order line items, one for customers, and one for products (see Table 4-2, Table 4-3, and Table 4-4, respectively). In each table, I've put an asterisk next to the column title that acts as the primary key column.

Table 4-2. The table of customers

Customer ID *

Customer Name

AA1

Al Albertson

BW3

Bill Williams

CC1

Chuck Charles

Table 4-3. The table of products

Product ID *

Product Name

Unit Price

BEV01COF

Coffee

3.99

BRD05RYE

Rye bread

2.68

BEV01COF

Coffee

3.99

CHP34PTO

Potato chips

0.99

Table 4-4. The table of order line items

Record ID *

Order ID

Customer ID

Product ID

Quantity

92231

10001

AA1

BEV01COF

3

92232

10001

AA1

BRD05RYE

1

92233

10002

BW3

BEV01COF

1

92234

10003

BW3

BEV01COF

2

92235

10004

CC1

CHP34PTO

7

To get combined results from multiple tables at once, join (or link) their matching fields. For instance, you can link the Customer ID field in the table of line items with the matching Customer ID primary key field in the table of customers. Once joined, the details for a single combined line item record can be presented with the matching full customer name. It's the same for direct joins with any two tables that have linkable fields. Figure 4-1 shows the relationships between the customer, product, and order line tables.

Three tables, and yet they work as one

Figure 4-1. Three tables, and yet they work as one

To join tables together, relational databases implement query languages that allow you to manipulate the data using relational algebra (from which the term relational database derives). The most popular of these languages, SQL, uses simple English-like sentences to join, order, summarize, and retrieve just the data values you need. The primary statement, SELECT, provides basic data selection and retrieval features. Three other common statements, INSERT, UPDATE, and DELETE, let you manipulate the records stored in each table. Together, these four statements make up the primary data manipulation language (DML) commands of SQL. SQL also includes data definition language (DDL) statements that let you design the tables used to hold the data, as well as other database features. I'll show examples of various SQL statements later in this chapter.

Vendor-specific systems such as Microsoft's SQL Server, Oracle's Oracle, Microsoft's Access, and IBM's DB2 extend these core DDL and DML features through additional data analysis and management tools. They also battle one another over important features such as data replication, crash-proof data integrity, the speed at which complex queries return the requested results, and who has the biggest private jet.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required