Chapter 1. Query Construction

Select queries are an essential part of any database system. These queries, which passively gather data (without changing the source data), are what we rely on to answer our questions about our data. In its most basic form, a select query merely returns records from a table verbatim. That’s not of much interest, since viewing the table itself would provide the same information. It’s when you add criteria, joins, and useful SQL functions and methods that select queries become valuable.

This chapter provides several tips for getting select queries to go the extra mile. Recipes in this chapter explain how to prompt for criteria at runtime, how to use logic operators to get the criteria just the way you need them, and how to handle duplicate records.

To make queries easier to read and work with, you’ll also find a recipe on using aliases, which provides a neat method to give nicknames to your tables. Another recipe explains how to use union queries to work around the problem of how to combine data from different tables so it can be treated as one source.

Finding Unmatched Records

Problem

I have a table that lists expenses incurred by employees. Some of these records do not match any records in the Employees table. How can I easily get a list of these unmatched expense records without having to examine every record in the table?

Solution

A special type of join called a left join (see Creating a Left Join) is used to identify records in one table that do not have matches within another table. The match, of course, has to be tested on a common field between tables—usually the unique key field of the parent table. The technique depends on having the criterion call for the matching field to be Null in the parent table. In other words, the query should return records from the child table in which no record (a Null) is found in the parent table.

Confused? Luckily, you can spare yourself the challenge of creating that query by using the Find Unmatched Query Wizard. The wizard will create the underlying SQL and run the query for you.

Figure 1-1 shows two tables: one lists employees, and the other lists expenses for which employees need to be reimbursed.

Employees and EmployeeReimbursements tables
Figure 1-1. Employees and EmployeeReimbursements tables

A number of records in the Employee Reimbursements table are “orphan” records— that is, they do not match any employee records in the table on the left (the parent table). The Find Unmatched Query Wizard will identify these records for you. From the Query tab in the Access database window, click the New button, or use the Insert → Query menu option to display the New Query dialog box shown in Figure 1-2. Select Find Unmatched Query Wizard, and click the OK button.

The wizard runs through a few screens. You’ll need to:

  1. Select the table or query that contains the records you want to identify. In this example, the EmployeeReimbursements table contains the records of interest (that is, the records that have no matches to the employee records themselves).

  2. Select the table that contains the records to match against.

  3. From each table, select the field to match on. Often this is the key field in one table and a foreign key in the other table.

  4. Select which fields from the table or query chosen in the first step should be included in the returned records.

Selecting the Find Unmatched Query Wizard
Figure 1-2. Selecting the Find Unmatched Query Wizard

Figure 1-3 shows the returned records from the EmployeeReimbursements table that do not have matches in the Employees table, based on the EmployeeID field.

Unmatched records have been identified
Figure 1-3. Unmatched records have been identified

Discussion

The wizard assembled this SQL statement:

	SELECT EmployeeReimbursements.*
	FROM EmployeeReimbursementsLEFT JOIN
	Employees ON
	EmployeeReimbursements.EmployeeID =
	Employees.EmployeeID
	WHERE (((Employees.EmployeeID) Is Null));

The SQL looks for records that do not exist in the matching table (i.e., that return a Null). It is not possible to include any fields from the matching table because no records are returned from the matching table; all the returned fields are from the table in which unmatched records are expected.

Making AND and OR Do What You Expect

Problem

Logic operators are not conceptually difficult to follow, but combining and nesting them does add complexity. If you don’t construct complex SQL statements very carefully, they may return incorrect or incomplete results, sometimes without reporting any errors.

Solution

Logic operators provide the flexibility to construct criteria in any way that suits your requirements. The AND operator returns true when all conditions are met; the OR operator returns true as long as one condition is met. In terms of how this applies to SQL construction, OR is used to set criteria for which one condition must be met, while AND is used to set criteria for which all the conditions must be met. Some examples are presented in Table 1-1.

Table 1-1. Examples of using logic operators

SQL statement

Description

SELECT DISTINCT State, City,
Count(LastName) AS Customers
FROM tblCustomers
GROUP BY State, City
HAVING State="NY" AND City="Yonkers"

This gives a count of customers located in Yonkers, NY. Only customer records in which both the state is New York and the city is Yonkers are counted.

SELECT DISTINCT State, City,
Count(LastName) AS Customers
FROM tblCustomers
GROUP BY State, City
HAVING State="NY"AND City="Yonkers" OR
City="Albany"

This gives a count of customer records for which the state is New York and the city is either Yonkers or Albany.

This produces an unintended result. The OR statement does not properly apply to both Yonkers and Albany. Any Yonkers customers must be in New York, but the way this SQL statement is constructed, Albany customers do not have to be in New York. Consequently, as Figure 1-4 shows, customers in Albany, GA will also be returned.

SELECT DISTINCT State, City,
Count(LastName) AS Customers
FROM tblCustomers
GROUP BY State, City
HAVING State="NY" AND (City="Yonkers" OR
City="Albany")

This correctly returns customer records for customers located only in Yonkers, NY and Albany, NY. Enclosing the cities and the OR operator in parentheses ensures that both cities must also match the state of New York on a record-by-record basis.

The second query returns all Albany customers
Figure 1-4. The second query returns all Albany customers

Discussion

OR is applied amongst records; AND is applied across fields. What does this mean? Figure 1-5 shows the tblCustomers table that is used as the example in this recipe. The OR operation involves evaluating the value in a particular field in each record. A single record cannot contain both Albany and Yonkers in its City field; it can contain at most one of those values. So, searching for customers in Albany or Yonkers requires looking for these values in the City field of each record (or, in our example, at least those records in which the state is New York). Thought of another way, when using OR, you can apply the statement multiple times to the same field. For example:

	City="Albany" OR City="Syracuse" Or City="Yonkers"

The AND operator, however, is not used on the same field. A SQL condition like this:

	City="Albany" AND City="Yonkers"
Each customer is in a single city
Figure 1-5. Each customer is in a single city

would make no sense. No records can be returned because there cannot be any records in which the single City field holds two values. Instead, AND is applied to pull together the values of two or more fields, as in:

	State="New York" AND City="Yonkers"

The query grid in Access is flexible enough to handle any combination of OR and AND operators. Figure 1-6 shows how the grid is used to return customer records from New York where the customer type is Retail or Wholesale, as well as customer records from Florida where the customer type is Internet or Mail Order. Internet and Mail Order customers from New York will not be returned, nor will Retail or Wholesale customers from Florida.

Along a single Criteria row, all of the conditions set in the different fields must be met (i.e., this is an AND operation). The SQL statement Access generates bears this out:

	SELECT [FirstName] & " " & [LastName] AS Customer,
	City, State, CustomerType
	FROM tblCustomers
	WHERE
	(((State)="NY") AND
	((CustomerType)="Retail" Or (CustomerType)="Wholesale"))
	OR
	(((State)="FL") AND
	((CustomerType)="Mail Order" Or (CustomerType)="Internet"))
	ORDER BY tblCustomers.CustomerType;
Applying AND and OR in the query grid
Figure 1-6. Applying AND and OR in the query grid

As you can see, the SQL condition for NY is followed by AND to get Retail and Wholesale customers from that state.

Working with Criteria Using the IN Operator

Problem

Using multiple OR operators in the query grid makes for an unmanageable experience. If too many values and ORs are placed in a grid column, the column may expand to be bigger than the viewable area.

Solution

A way to save space in the query grid is to use the IN operator. IN is used in conjunction with a list of values from which any value can be returned. This essentially means that the IN operator works in the same fashion as the OR operator. It is not required that all conditions be met; meeting one of the conditions suffices.

Here is a SQL statement that returns records for students that took at least one of the listed courses:

	SELECT Students.Student, Student_Grades.Course,
	Student_Grades.Instructor
	FROM Students INNER JOIN Student_Grades ON
	Students.StudentID = Student_Grades.StudentID
	WHERE
	(((Student_Grades.Course)="Beginner Access"))
	OR
	(((Student_Grades.Course)="Beginner Excel"))
	OR
	(((Student_Grades.Course)="Advanced Access"))
	OR
	(((Student_Grades.Course)="Advanced Excel"));

Using IN provides a more streamlined SQL statement. Notice how the WHERE section has shrunk:

	SELECT Students.Student, Student_Grades.Course,
	Student_Grades.Instructor
	FROM Students INNER JOIN Student_Grades ON
	Students.StudentID = Student_Grades.StudentID
	WHERE Student_Grades.Course In
	("Beginner Access","Beginner Excel",
	"Advanced Access","Advanced Excel");

Discussion

The IN operator provides a syntax convenience. It makes it easier to eyeball a set of criteria values to which OR logic is applied. Figure 1-7 shows an example of using IN to return records where the instructor is either Brown or Maxwell.

Using the IN operator to specify the instructor
Figure 1-7. Using the IN operator to specify the instructor

That’s simple enough to follow: when the instructor is either Brown or Maxwell, the record is returned. Figure 1-8 shows an example of using IN in two fields.

The example shown in Figure 1-8 returns records in which either Brown or Maxwell taught Beginner Access, Advanced Access, or Intro to VBA. In other words, all combinations of these instructors and courses are returned.

Using the IN operator for both the Instructor and Course fields
Figure 1-8. Using the IN operator for both the Instructor and Course fields

Adding criteria to other fields will further cut down the number of returned records. The next example adds new criteria to the row. The Instructor and Course fields still have IN operators, but now only records that have a MidTerm Grade and a Final Grade of 85 or better are returned. Here is the SQL statement for this query:

	SELECT Student_Grades.Instructor, Student_Grades.Course,
	Students.Student, Student_Grades.[MidTerm Grade],
	Student_Grades.[Final Grade]
	FROM Students INNER JOIN Student_Grades ON
	Students.StudentID = Student_Grades.StudentID
	WHERE (((Student_Grades.Instructor) In
	("Brown","Maxwell")) AND ((Student_Grades.Course) In
	("Beginner Access","Advanced Access","Intro to VBA")) AND
	((Student_Grades.[MidTerm Grade])>=85) AND
	((Student_Grades.[Final Grade])>=85))
	ORDER BY Student_Grades.Course, Students.Student;

The IN operator is handy when using subqueries. A subquery returns a set of records to which the rest of a query can apply further criteria. The following SQL statement returns information for those students who got a 90 or better in either Advanced Access or Advanced Excel and took either Beginner Access or Beginner Excel last year:

	SELECT Student_Grades.Instructor, Student_Grades.Course,
	Students.Student, Student_Grades.[MidTerm Grade],
	Student_Grades.[Final Grade]
	FROM Students INNER JOIN Student_Grades ON
	Students.StudentID = Student_Grades.StudentID
	WHERE (((Student_Grades.Course) In
	("Advanced Access","Advanced Excel")) AND
	((Student_Grades.[Final Grade])>=90) AND
	((Students.StudentID) In
	(Select Stud_ID From LastYear Where
	(Course="Beginner Access") Or (Course="Beginner Excel"))))
	ORDER BY Student_Grades.Course, Students.Student;

The IN operator is applied to the LastYear table through a subquery. Here is the portion of the SQL that does this:

	((Students.StudentID) In
	(Select Stud_ID From LastYear Where
	(Course="Beginner Access") Or (Course="Beginner Excel"))))

The Select statement within the larger SQL statement is where the subquery starts. The subquery returns StudentIDs that have matches in the LastYear table (on the Stud_ID field) for those students who took Beginner Access or Beginner Excel.

Excluding Records with the NOT Operator

Problem

I have a large number of client names in my data. I need to return a list of clients that are not on the Hold list. Most clients are OK, so most will be returned in the query. How do I keep out the few clients who are on hold?

Solution

The method here is to exclude records from being returned, rather than the typical approach of identifying records that are to be returned. Figure 1-9 shows two data-base tables. The table on the left is a list of client orders. The table on the right is a list of clients (by ClientID) who are “on hold”—that is, clients whose accounts are in arrears and whose orders should not be shipped. Running a query that causes the clients identified in the OnHold table to be excluded from the Clients table is the key to this recipe.

A subquery works well here to gather the records from the second table into the query result. Using the NOT operator provides the twist to make the records excluded instead of included.

The NOT operator is placed in front of the subquery to reverse the logic. If NOT were left out, the query would return records that match in both tables. When NOT is applied, only those records from the Clients table that do not have matching records in the OnHold table are returned. Here is the SQL statement:

	SELECT Clients.ClientID, Clients.Client,
	Clients.OrderDate, Clients.OrderAmount
	FROM Clients
	WHERE (((Clients.ClientID)
	NOT In (Select ClientID from OnHold)));
A table of clients and a table of clients on hold
Figure 1-9. A table of clients and a table of clients on hold

Discussion

NOT is a logic operator that reverses a Boolean state, so NOT true equals false, and NOT false equals true. When a query calls for matching criteria, preceding the criteria construct with NOT flips this around and calls for records that specifically do not match the criteria.

Our sample Clients table has 200 records, and the OnHold table has 8 records. The result is that the query returns 192 records—that is, all orders for clients who are not on hold.

Parameterizing a Query

Problem

I need to construct a query that takes a criterion, but the criterion’s value will not be known until the query is run. When it’s time to run the query, the user needs a way to supply the criterion without going into the design of the query.

Solution

A query can be designed to accept parameters at the time it is run. Typically, an input box will appear in which the user enters the value for the criterion. A query can have any number of criteria entered in this fashion. A set of brackets defines the question asked in the input box. The brackets and the prompt to the user are placed in the Criteria row of the query grid for the given field. For example, using "[Enter an age]” as the criterion for a field instructs Access to present this prompt in a dialog box, as shown in Figure 1-10.

Prompting the user to enter a parameter into a query
Figure 1-10. Prompting the user to enter a parameter into a query

Discussion

When a query is run, a traditional form is often displayed to enable users to enter parameter values or make selections from a list. But the ability to place parameters directly in the structure of a query provides a great alternative to having to build a form that gathers input. When the criteria are simple, just using brackets in the query design will suffice.

Figure 1-11 shows the query design that prompts the user to enter an age. When the query is run, the dialog shown in Figure 1-10 will appear, and the returned records will be filtered to those that match the entered value.

The design of the query with the age parameter
Figure 1-11. The design of the query with the age parameter

Here is the actual SQL statement that is built using the query grid:

	SELECT Name_City_Age.ID, Name_City_Age.FirstName,
	Name_City_Age.LastName, Name_City_Age.City,
	Name_City_Age.Age
	FROM Name_City_Age
	WHERE (((Name_City_Age.Age)=[Enter an age]));

Note that in the WHERE clause the phrase “Enter an age” appears enclosed in brackets.

Tip

Although the phrase “Enter an age” is used here to define the criterion for a field named Age, there is no strict requirement to use the word “age” in the bracketed phrase. We could just as well have used “Enter a number”; it wouldn’t matter because the text in the brackets does not have to contain the name of the field for which it is used.

A query can have multiple parameters, and these parameters fit in with the structure of the SQL WHERE clause. A common criterion structure is to use a range of values to determine which records to return. In the current example, a query might need to return all records that fit within a range of ages. The Between/And SQL construct is used for this purpose. Figure 1-12 shows the modification in the query design.

A query that uses two parameters to filter a single field
Figure 1-12. A query that uses two parameters to filter a single field

Here’s the updated SQL:

	SELECT Name_City_Age.ID, Name_City_Age.FirstName,
	Name_City_Age.LastName, Name_City_Age.City,
	Name_City_Age.Age
	FROM Name_City_Age
	WHERE (((Name_City_Age.Age) Between
	[Enter the lowest age] And
	[Enter the highest age]));

When this query is run, two prompts will appear: one asks for the lowest age, and the other asks for the highest age. Figure 1-13 shows a sample of returned records when the range was defined as between the ages of 20 and 40.

Returned records for the age range 20–40
Figure 1-13. Returned records for the age range 20–40

The SQL Like operator can also be used with a bracketed prompt. Like is used with a wildcard to return records in which the criterion fits a pattern. For example, in a query that returns all those whose last names start with the letter D, the WHERE portion of the SQL statement looks like this:

	WHERE (((LastName) Like "D*"));

Using the Like operator with a parameter prompt requires the brackets, of course, and careful placement of the wildcard character (*) and the quotation marks, as follows:

	WHERE (((LastName) Like
	[Enter the first letter of the last name: ] & "*"));

Figure 1-14 shows how this is entered in the query grid.

To return a smaller set of results, you can match on a more complex pattern; for example, the user can enter “De” to have names such as Deere returned, but not names such as Dole. In this case, you’ll need to adjust the phrasing of the prompt accordingly. Phrasing prompts correctly is as much art as it is SQL.

The example here uses an asterisk wildcard. Any number of characters can be returned in place of that wildcard, but the character(s)entered as the parameter are what fine-tunes the record filtering.

Specifying a data type for the parameter

In certain situations, you must indicate the data type of the parameter. You do this when:

Using the Like operator
Figure 1-14. Using the Like operator
  • Using a crosstab query

  • Using a query as the source for a chart

  • Prompting for Boolean (true/false) values

  • Prompting for fields from a table in an external database

Parameter data types are entered in the Query Parameters dialog (see Figure 1-15).

Using the Query Parameters dialog to establish data types
Figure 1-15. Using the Query Parameters dialog to establish data types

To display the dialog, select the Query → Parameters menu option. In the left side of the dialog, enter the prompts that you’ve established in the design grid. Then select the data types in the right side of the dialog.

Returning a Top or Bottom Number of Records

Problem

I have a large table of data that contains thousands of records and several dozen fields. I create models based on various fields and/or ranges of values in the fields. I use queries to set up the sums using SQL aggregates and expressions. This is exactly what I need, but the problem is that the number of records slows down the processing. When I’m testing calculations, I don’t need all the records. How can I pull out just a handful of them to use for testing?

Solution

The SQL TOP predicate is just what is called for here. It lets you specify how many records to return, either as an exact number or as a percentage of the total number of records in the underlying table or query.

Let’s say you have a standard select query such as the one shown in Figure 1-16. The SQL statement is:

	SELECT SampleNum, Identifier, Fact1, Fact2,
	Fact3, Fact4, Fact5, Fact6, Fact7, Fact8
	FROM ConsumerTrendData;
A simple select query returns all records
Figure 1-16. A simple select query returns all records

To specify a subset of records to search through to test the query—say, 40—use the TOP predicate, as follows:

	SELECT TOP 40 SampleNum, Identifier, Fact1,
	Fact2, Fact3, Fact4, Fact5, Fact6,
	Fact7, Fact8
	FROM ConsumerTrendData;

TOP comes directly after the SELECT keyword, and is followed by the number of records to return. Instead of reducing the number of returned records based on criteria, TOP reduces the number of returned records without any bias.

When working with the Access query grid, you can opt to use TOP by going into the query properties. To do this, use the View → Properties menu option while designing the query. The properties sheet that opens may display the properties for a field. If this is the case, click on the top pane of the query designer (above the grid)but not on any tables—in other words, click on the empty area. This will ensure that the properties sheet displays the query properties (see Figure 1-17).

The Query Properties sheet
Figure 1-17. The Query Properties sheet

One of the properties is Top Values. In Figure 1-17, you can see that the value of 40 is already entered.

Discussion

To return a percentage of the records, you can place a percent sign (%) after the entered number in the Top Values property on the properties sheet, or you can enter the word PERCENT directly in the SQL statement. Here, for example, is the SQL to return the top 20 percent of the records:

	SELECT TOP 20 PERCENT SampleNum, Identifier,
	Fact1, Fact2, Fact3, Fact4, Fact5,
	Fact6, Fact7, Fact8
	FROM ConsumerTrendData;

Using TOP to return the “top” X number of records begs the question of what makes the hierarchy of records in a table. Only the application of an index or sort provides any structure to the records. We often use AutoNumber fields, which order the records. But what happens when we sort on another field? The “top” records change.

Using the TOP predicate requires that the use of a sort, or lack thereof, always be considered. Here is an example of returning the top five records of a sorted table:

	SELECT TOP 5 SampleNum, Identifier, Fact1,
	Fact2, Fact3, Fact4, Fact5, Fact6,
	Fact7, Fact8
	FROM ConsumerTrendDataORDER BY Identifier;

Now that we’ve sorted the data in ascending order (the default sort direction)with the ORDER BY clause, asking for the top five records has some relevance. Turning this upside down would provide the bottom five records. But how could we do that? There is no “bottom” predicate. Instead, we simply change the sort to descending using the DESC keyword:

	SELECT TOP 5 SampleNum, Identifier, Fact1,
	Fact2, Fact3, Fact4, Fact5, Fact6,
	Fact7, Fact8
	FROM ConsumerTrendData
	ORDER BY Identifier DESC;

This example requests a descending sort on the Identifier field. Requesting the top five records will now return what were the bottom five records when we did an ascending sort. Figure 1-18 shows the results of running these two queries. The sort on the Identifier field is ascending in one query and descending in the other.

Ascending and descending sorts
Figure 1-18. Ascending and descending sorts

Returning Distinct Records

Problem

When running select queries, you may need to control whether duplicate records are returned in the query result. However, there could be disagreement about what constitutes uniqueness and duplication. Often, a few fields may contain duplicate information among records, and it’s the additional fields that bring unique values to the records. How can queries be managed with regard to controlling how duplicate information is handled?

Solution

Figure 1-19 shows a table in which there are records that are near duplicates. None are exact duplicates since the CustomerID field ensures uniqueness. However, the two records for Vickie Storm could be seen as duplicates, as all fields except the CustomerID field hold duplicate information. The records for Ebony Pickett also contain some duplicate information, although two different cities are listed.

A table with duplicates
Figure 1-19. A table with duplicates

SQL provides ways of handling how records such as these are returned or excluded when select queries are run. Access makes use of the SQL predicates Distinct and DistinctRow:

Distinct

Bearing in mind that not all fields need to be included in a select query, Distinct will exclude duplicates when the duplication occurs within just the selected fields, regardless of whether the complete set of record fields would prove the records to be unique.

DistinctRow

DistinctRow is used to manage duplicates in a query that joins tables. Assuming unique records in the parent table, DistinctRow lets you avoid having duplicates returned from the child table.

You can incorporate these predicates by using the query designer or writing them directly into the SQL statement. With a query in design mode, use the View → Properties menu option to display the Query Properties dialog box, shown in Figure 1-20. Two properties are of interest here: Unique Values and Unique Records. These can both be set to No, but only one at a time can be set to Yes.

Setting the Unique Values and Unique Records properties
Figure 1-20. Setting the Unique Values and Unique Records properties

Setting Unique Values to Yes places the DISTINCT predicate in the SQL statement. For example:

	SELECT DISTINCT Customers.FirstName,
	Customers.LastName, Customers.Address,
	Customers.City, Customers.State
	FROM Customers;

Similarly, setting the Unique Records property to Yes places the DISTINCTROW predicate just after the SELECT keyword.

Discussion

For our sample table, a simple select query of the Customers table on just the FirstName and LastName fields would return nine records, without regard to the fact that the returned results would show two records for Ebony Pickett and two records for Vickie Storm. Using Distinct in the SQL statement will change the returned count to seven records. In particular, this SQL statement:

	Select Distinct FirstName, LastName From Customers
	Order By LastName

produces the result shown in Figure 1-21.

Distinct records are returned
Figure 1-21. Distinct records are returned

When the City and State fields are added to the SQL statement, like this:

	Select Distinct FirstName, LastName, City, State
	From Customers
	Order By LastName

eight records are returned (see Figure 1-22). The additional record appears because Ebony Pickett is listed in two unique cities. As far as the query goes, there are now two unique Ebony Pickett records, and they are both returned. Vickie Storm still has just one record returned, however, because the source data for her city and state are identical in both of her records.

Using DistinctRow

Now, let’s take a closer look at using DistinctRow, which manages duplicates in multitable joins. Figure 1-23 shows two tables: a Customers table (this table does not contain any duplicates) and a table of purchases related back to the customers.

Distinct records are returned based on additional fields
Figure 1-22. Distinct records are returned based on additional fields
Customers and Purchases tables
Figure 1-23. Customers and Purchases tables

Say you want to find out which customers have placed orders. A SQL statement that joins the tables but does not use DistinctRow will return a row count equivalent to the number of records in the child (Purchases)table. Here is a simple SQL statement that returns the names of the customers who placed each of the orders:

	SELECT Customers.CustomerID, Customers.FirstName,
	Customers.LastName
	FROM Customers INNER JOIN Purchases ON
	Customers.CustomerID = Purchases.CustomerID;

The result of running this query is shown in Figure 1-24. No fields from the Purchases table have been included, but the effect of the multiple child records is seen in the output—a customer name is listed for each purchase.

The simple query returns duplicate master records
Figure 1-24. The simple query returns duplicate master records

Adding the DistinctRow predicate ensures that the returned master records are free of duplicates:

	SELECT DistinctRow Customers.CustomerID,
	Customers.FirstName, Customers.LastName
	FROM Customers INNER JOIN Purchases ON
	Customers.CustomerID = Purchases.CustomerID;

The result is shown in Figure 1-25.

Using DistinctRow avoids duplicate records
Figure 1-25. Using DistinctRow avoids duplicate records

Returning Random Records

Problem

For efficient analysis work, I need to pull random records out of my source table. Each time I run a query, I’d like to have the records returned in an unknown order.

Solution

The technique to apply here is to sort the records on a random value using the Rnd function. Figure 1-26 shows a table with three fields. To return the records in a random order, pass the name of one of the fields as the argument to the Rnd function in the ORDER BY clause of the SQL statement.

For example, using the Temperature field, the SQL statement necessary to return the records in random order is:

	SELECT Samples.Location, Samples.Temperature, Samples.Date
	FROM Samples
	ORDER BY Rnd(Samples.Temperature);

Figure 1-27 shows the result of running the query. Bear in mind that each time the query is run, the records will be returned in a different order.

Discussion

Using the Rnd function on one field while performing an ascending or descending sort on another field provides an interesting, sometimes useful result. For example, this SQL statement performs sorts on two fields (one ascending and one random):

	SELECT Samples.Location, Samples.Temperature,
	Samples.Date
	FROM Samples
	ORDER BY Samples.Location, Rnd(Samples.Temperature);
A table from which random records are required
Figure 1-26. A table from which random records are required
Queried records are returned in a random order
Figure 1-27. Queried records are returned in a random order

Figure 1-28 shows the result of running this query. An ascending sort is done on the Location field, so Facility A records float to the top. However, the temperatures are sorted randomly. Thus, each time this query is run, all the Facility A records will be on top, but the Facility A records will be randomly sorted based on the way the Temperature field is handled.

One field is sorted in ascending order and another is randomly sorted
Figure 1-28. One field is sorted in ascending order and another is randomly sorted

Fine-Tuning Data Filtering with Subqueries

Problem

I need to determine which records in a table have above-average values for a particular quantitative field. How can I calculate the average and filter the records in one query?

Solution

The AVG aggregate function calculates the average value of a field across the records included in a query. While that is a straightforward operation, comparing the value in each record to the average presents a challenge. One way to do this is to use a subquery. A subquery is literally a query within a query, typically located within the WHERE section of the main query.

Figure 1-29 shows a table of teams and their scores for the season. The task is to identify which teams have a season score that is greater than the average of all the scores.

A table of teams and scores
Figure 1-29. A table of teams and scores

A little finesse with SQL is required to identify the teams that beat the average. The AVG aggregate function (see Finding the Sum or Average in a Set of Data)is needed, but it is not applied in the typical way in the query grid. unmatched_Figure 1-30 shows how the query is entered into the query grid. Select View → Totals while designing the query to display the Total row in the grid. Then, create an additional column in the grid based on the SeasonScore field. Don’t select AVG in the Total row; instead, select Where from the drop-down list, and enter the subquery in the Criteria row.

A subquery design in the query grid
Figure 1-30. A subquery design in the query grid

In this example, the greater-than sign (>) precedes the subquery, since we are looking for scores that are greater than the average. The AVG function appears in the subquery itself, which has the following syntax:

	Select AVG(SeasonScore) From SeasonScores

Here’s the full SQL statement:

	SELECT Team, SeasonScore
	FROM SeasonScores
	WHERE (((SeasonScore)>
	(Select AVG(SeasonScore) From SeasonScores)))
	GROUP BY Team, SeasonScore
	ORDER BY Team;

Discussion

In the preceding example, the subquery resides in the WHERE section of the outer query. An alternative is to have the subquery act as one of the fields in the outer query’s SELECT section. Figure 1-31 shows two tables and a query. On the left is the SeasonScores table presented earlier. On the right is a related child table that lists game dates and locations for each team in the first table. The query, whose result is shown underneath the tables, has returned the date of the last game played by each team.

A subquery that queries a second table
Figure 1-31. A subquery that queries a second table

Here is the SQL statement of the Last Game Played Per Team query in Figure 1-31:

	SELECT SeasonScores.Team,
	(Select Max(Date) From Games Where
	Games.TeamID = SeasonScores.TeamID)
	AS [Date Of Last Game]
	FROM SeasonScores;

The subquery is placed where a field would typically go. It is encased in parentheses and is given an alias for the field name (Date Of Last Game) outside of the subquery. Within the subquery is the join between the two tables.

Tip

There are other ways to return the same information, using different query constructs—for example, the tables could be joined and a Max of Date could be used. Working it in as a subquery serves here to illustrate a new method.

Removing excessive queries

One of the advantages of using subqueries is the avoidance of nested queries. Let’s look at an example. Say you have two tables: tblCustomers, which contains customer names; and tblCustomerReachOut, which contains the dates at which customers were contacted. The CustomerID field is present in both tables, as shown in Figure 1-32. Now, consider the task of gathering a list of customers who have not been contacted in more than 60 days.

A table of customers and a table of contact dates
Figure 1-32. A table of customers and a table of contact dates

Let’s look at a common approach to extracting the desired information. First, a query is constructed that returns—from the table of contact dates—records in which the last contact date is more than 60 days from the present date, as shown in Figure 1-33.

A query that returns records based on elapsed time
Figure 1-33. A query that returns records based on elapsed time

The query in Figure 1-33 is then queried from another query. This next query effectively takes the records from the first query and matches them with customer names from the tblCustomers table. This second query is shown in Figure 1-34.

A query that uses the returned records of another query
Figure 1-34. A query that uses the returned records of another query

To summarize, two queries have been constructed, though one would do the trick. When this inefficient design is perpetuated repeatedly in a database, the database can become laden with dozens of queries that seem isolated on the surface. When viewing a list of all the queries in a database (on the Queries tab), there is no immediate method to know which queries are called by others. Nested queries do work, but they’re often unnecessary.

In contrast to the approach just described, here is the SQL of a query/subquery construction that returns the same records:

	SELECTDISTINCT tblCustomers.CustomerFirstName,
	tblCustomers.CustomerLastName, tblCustomers.CustomerAddr1
	FROM tblCustomers
	WHERE (((tblCustomers.CustomerID)
	In (Select tblCustomerReachOut.CustomerID FROM tblCustomerReachOut
	WHERE DateDiff("d",[ContactDate],Now())>60)));

This SQL statement uses a subquery to return the CustomerIDs from the tblCustomerReachOut table for customers who were last contacted more than 60 days earlier (the DateDiff function is used to calculate the elapsed time). The returned CustomerIDs are matched with related records in the tblCustomers table, thereby returning the customer names. Note that the SQL statement uses the DISTINCT predicate (discussed in Returning Distinct Records), as the returned records from the tblCustomerReachOut table can include duplicate CustomerIDs. This makes sense because customers are likely to be contacted more than once. Using the DISTINCT predicate ensures that the final returned list of names will not contain duplicates.

Combining Data with Union Queries

Problem

I need to combine sets of data so I can run analyses on them. The sets of data are identical but sit in different tables. There’s no way to combine the data in the query grid. I could use append queries to copy data from the various tables to a master table, but this is inefficient. The data in the smaller tables changes from time to time, and having to rerun the appends is a nuisance. Isn’t there a way to simply combine the data at any time as needed, so the latest data in the smaller tables is always present?

Solution

A union query is the perfect vehicle for combining identically structured data. To create a union query, place Union SQL clauses between the Select statements that query the tables.

Figure 1-35 shows three tables with an identical structure. Let’s take a look at how to combine the data from these three tables.

Union queries must be written in the SQL pane of the query designer. It is not possible to represent them in the query grid. Here’s a SQL statement written in the SQL pane of the query designer:

	SELECT * From SeasonScores_Putnam
	Union
	SELECT * From SeasonScores_Rockland
	Union
	SELECT * From SeasonScores_Westchester;
Three tables with identically structured data
Figure 1-35. Three tables with identically structured data

Running the query returns a single set of data, shown in Figure 1-36.

The result of running a union query
Figure 1-36. The result of running a union query

All the records from the three tables are now together in one place. This query can be saved and then used as the source for other queries and further analysis. For example, this saved query can be used in a query that calculates an average or some other summarization. If and when any data changes back in the source tables, the new data will flow through to the output of this union query, as each time it is rerun, it uses the latest data from the source tables.

Discussion

A hard-and-fast rule is that all the selects feeding into a union query must have the same number of fields. In the previous example, this was a given because the three source tables were identical in structure. However, imagine assembling a list of names from various data tables, such as a Contacts table, a Prospects table, and a HolidayList table.

Figure 1-37 shows the design of these three tables. Each table has a field for a first name and a last name, although the fields are not named exactly the same. Also note that the number of fields is not consistent among all the tables. To avoid this being an issue, you must specify actual field names in the Select statements and ensure that you specify the same number of fields from each table.

Three tables with similar information
Figure 1-37. Three tables with similar information

A union SQL statement that will combine the first and last names from these tables can be written like this:

	SELECT FirstName, LastName From Contacts;
	Union
	Select [First Name], [Last Name] From HolidayList
	Union
	Select [Prospect First Name], [Prospect Last Name]
	From Prospects
	Order By LastName, FirstName

Tip

Brackets ([]) must be placed around field and table names that contain spaces.

The result of the query is shown in Figure 1-38. The field names presented in the query (FirstName and LastName) are taken from the first Select statement.

A union query based on three tables
Figure 1-38. A union query based on three tables

While each source table on its own may be free of duplicates, it is possible that some duplicates will occur in the combined output. For example, the same person might be in the Contacts table and the Prospects table. SQL provides a way to handle duplicates that appear when union queries are run.

By default, a union query will drop duplicates. If you want to include them in the result, you’ll need to use the Union All construct, as shown here:

	SELECT FirstName, LastName From Contacts;
	Union All
	Select [First Name], [Last Name] From HolidayList
	Union All
	Select [Prospect First Name], [Prospect Last Name]
	From Prospects
	Order By LastName, FirstName

Including the All keyword forces duplicates to be preserved in the query result.

Inserting On-the-Fly Fields in Select Queries

Problem

I need to include additional information in a query’s output. The information is sometimes based on the fields in the query, but at other times, it just needs to be inserted as a fixed message. How can I do this?

Solution

In the Field row of the Access query design grid, you can enter a name that will appear in the output as a field name, as any standard field name would. Follow this with a colon (:)and the value that will go into the new field, and you have created a new output field that exists only during the run of the query. This field is not saved back into any source tables. The value that goes into the new field can be dependent on other fields in the query, or it can be completely independent.

Figure 1-39 shows a table of clients and balances due.

A table of clients and balances due
Figure 1-39. A table of clients and balances due

Figure 1-40 shows a query based on the table. In the query are two created fields that do not actually exist in the table. The first is named Client. The value for the Client field comes from a concatenation of the FirstName and LastName table fields.

Another new field—Message—provides a fixed string when the query is run. The Message field is populated with an expression that has nothing to do with any table fields.

A query with expression-based fields
Figure 1-40. A query with expression-based fields

As shown in Figure 1-41, the query result contains two fields that list the clients’ full names and the fixed message.

The result of running the query
Figure 1-41. The result of running the query

Discussion

Using expression-based fields in queries provides ways to treat records with some intelligence. A useful example involves using an expression to return a message for certain records based on the value of a table field. Figure 1-42 shows how the IIf function is incorporated into our derived Message field. Now, the message about the Spring Special will appear only in records that have a balance of 100 or less.

Using a condition in a field expression
Figure 1-42. Using a condition in a field expression

Here is the SQL statement for the query in Figure 1-42:

	SELECT [FirstName] & " " & [LastName] AS Client,
	IIf([Balance]<=100,"Don't Miss Our Spring Special!","")
	AS MessageFROM tblClients;

When you run this query, you’ll find that clients with balances over 100 do not have the message written into their records. What is the point of this? This technique may be useful in a mail merge, for example. When creating letters or statements to clients, you may wish to advertise the Spring Special to just those customers with a low balance or a balance of zero.

Using Aliases to Simplify Your SQL Statements

Problem

Table names precede field names in SQL statements, so queries that use multiple fields and tables wind up being very long. Is there a way to use shortcut identifiers instead for the table names?

Solution

Yes, there is! In a SQL statement, any table name can be given an alias. The place to do this is after the FROM keyword, where the table name is entered. Follow the table name with an alias of your choosing (make sure it is not a reserved word, an existing field name, etc.). Then, use the alias instead of the table name in the other areas of the SQL statement. Let’s look at an example.

Using the Access query grid to assemble the query results in this SQL statement that addresses a single table:

	SELECT tblCustomers.CustomerCompanyName,
	tblCustomers.CustomerFirstName,
	tblCustomers.CustomerLastName,
	tblCustomers.CustomerAddr1,
	tblCustomers.CustomerAddr2,
	tblCustomers.CustomerCity,
	tblCustomers.CustomerState,
	tblCustomers.CustomerZip,
	tblCustomers.CustomerHomePhone,
	tblCustomers.CustomerWorkPhone
	FROM tblCustomers;

Here is the same query, this time using the alias C for tblCustomers. The alias is placed just after the table name in the FROM section, and all references to the table name in the rest of the query just use the alias:

	SELECT C.CustomerCompanyName,
	C.CustomerFirstName, C.CustomerLastName,
	C.CustomerAddr1, C.CustomerAddr2, C.CustomerCity,
	C.CustomerState, C.CustomerZip, C.CustomerHomePhone,
	C.CustomerWorkPhone
	FROM tblCustomers C;

This SQL statement is much shorter and easier to follow.

Discussion

Aliases are also useful—perhaps even more so—with queries that address multiple tables. Here is the SQL statement of a query that addresses three tables (tblCustomers, tblInvoices, and tblInvoicePayments). Inner joins connect the tables on key fields:

	SELECT tblCustomers.CustomerCompanyName,
	tblCustomers.CustomerFirstName,
	tblCustomers.CustomerLastName,
	tblCustomers.CustomerAddr1,
	tblCustomers.CustomerAddr2,
	tblCustomers.CustomerCity,
	tblCustomers.CustomerState,
	tblCustomers.CustomerZip,
	tblInvoices.InvoiceNumber,
	tblInvoices.InvoiceDate,
	tblInvoices.Status,
	tblInvoices.Hours, tblInvoices.Rate,
	tblInvoicePayments.PaymentAmount,
	tblInvoicePayments.PaymentDate,
	tblInvoicePayments.PaymentType
	FROM (tblCustomers INNER JOIN tblInvoices ON
	tblCustomers.CustomerID = tblInvoices.CustomerID)
	INNER JOIN tblInvoicePayments ON
	tblInvoices.InvoiceID = tblInvoicePayments.InvoiceID;

Now, here is the same SQL statement, but with aliases of C for tblCustomers, I for tblInvoices, and P for tblInvoicePayments:

	SELECT C.CustomerCompanyName, C.CustomerFirstName,
	C.CustomerLastName, C.CustomerAddr1,
	C.CustomerAddr2, C.CustomerCity, C.CustomerState,
	C.CustomerZip, I.InvoiceNumber, I.InvoiceDate,
	I.Status, I.Hours, I.Rate, P.PaymentAmount,
	P.PaymentDate, P.PaymentTypeFROM (tblCustomers C INNER JOIN tblinvoices I ON
	C.CustomerID=I.CustomerID) INNER JOIN
	tblInvoicePayments P ON I.InvoiceID=P.InvoiceID;

Clearly, the SQL statement with aliases is shorter. Again, each table is assigned its alias just after its name appears in the FROM and INNER JOIN sections.

Creating a Left Join

Problem

I have a table of students and a table of courses they have taken. Not every student has taken a course. I want a listing of all the students and any courses they have taken, including students who have not yet taken a course. However, when I run the query normally, I only get back records of students who have taken at least one course.

Solution

Figure 1-43 shows the standard query you would use to query from two tables. This query will return all records from Students who have related records in Courses Taken. If, for a given student, there is more than one record in Courses Taken, the number of related records in Courses Taken is the number of records that will be returned for that given student. But students with no matched courses are left out of the returned records altogether.

Figure 1-43 shows an inner join. To ensure that all records from the master table (Students) appear in the results, the query must be changed to a left join. This is easy to do when the query is in design mode: either use the View → Join Properties menu option, or double-click on the line that connects the table to display the Join Properties dialog box, shown in Figure 1-44.

In the Join Properties dialog box are three numbered options. The first one is the standard inner join. The second one creates a left join. The third option creates a right join (see Creating a Right Join). Select the second option and click OK. Now, when the query is run, all records from the Students table will appear, as shown in Figure 1-45.

An inner join query returns only matched records
Figure 1-43. An inner join query returns only matched records
Setting the properties for a left join
Figure 1-44. Setting the properties for a left join

Discussion

A left join returns all records from the master table, and probably all records from the child table (here, Courses Taken). This last fact depends on whether referential integrity exists between the tables. In other words, if referential integrity is enforced, each record in the Courses Taken table must match to a record in the Students table. Then, even though there are student records with no matching courses, all course records must belong to students and hence are returned in the query.

The result of running a left join query
Figure 1-45. The result of running a left join query

If referential integrity is not applied, any records in Courses Taken that do not relate to records in the Students table will not be included in the query’s returned records.

Creating a Right Join

Problem

I have a parent table and a child table. The parent table contains customers, and the child table contains purchases, but some records in the Purchases table do not belong to any customer. I need to run a query that returns all the records from the Purchases table, even if there is no matching customer. At the very least, this will help me identify purchases that are not being billed to anyone.

Solution

The request here is for a right join. In a right join, all records are returned from the child table, including those that have no match in the parent table. For any such records to exist in the child table, referential integrity must not exist between the tables. The presence of orphan records is possible only when such records can exist outside the confines of referential integrity with the parent table.

Figure 1-46 shows how a right join is created: use the third option in the Join Properties dialog box (displayed via the View → Join Properties menu command).

Setting up a right join in a query
Figure 1-46. Setting up a right join in a query

Discussion

When a right join query is run, the number of returned records matches the number of records in the child table (assuming no criteria were used). For fields from the parent table, there will be blank data for the records in which there is no match between tables. Figure 1-47 shows the result of running the right join query.

The result of running a right join query
Figure 1-47. The result of running a right join query

Creating an Outer Join

Problem

I wish to combine the output of both a left join and a right join into a single query, but I can’t get Access to do this type of "outer join.”

Solution

A left join will return all the records from the table on the right side (the parent)and any related records from the table on the left side (the child). A right join will return all the records from the table on the left side and any related records from the table on the right side. An outer join combines these two outputs into one.

Access doesn’t directly support outer joins, but because it is reasonable to create left and right joins, these two constructs can be brought together with a union query (see Combining Data with Union Queries).

Figure 1-48 shows two database tables. Not all the teams in the Teams table have matching records in the Games table, and the Games table contains some records that have no relation to the teams in the Teams table.

Two tables with some related and some orphan records
Figure 1-48. Two tables with some related and some orphan records

To create an outer query, you must be in the SQL pane of the query designer. The following is the SQL that would simulate an outer query for these tables by combining the output of a left join with a right join:

	SELECT Teams.Team, Games.Date, Games.Location
	FROM Games LEFT JOIN Teams ON
	Games.TeamID=Teams.TeamID
	UNION
	SELECT Teams.Team, Games.Date, Games.Location
	FROM Games RIGHT JOIN Teams ON
	Games.TeamID = Teams.TeamID
	ORDER BY Team, Date;

Discussion

When the query is run (see Figure 1-49), there are, as expected, some blanks in the fields from the Teams table and the Games table (from Figure 1-48). The majority of records are matched. Running a standard inner join query on these tables returns 35 records—the count of records that match. This outer join result returns 49 records. There are 14 records that have blanks for the source from one table or the other.

The result of running an outer join
Figure 1-49. The result of running an outer join

Get Access Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.