O'Reilly logo

Access Data Analysis Cookbook by Wayne S. Freeze, Ken Bluttman

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Fine-Tuning Data Filtering with Subqueries

Problem

I need to determine which records in a table have above-average values for a particular quantitative field. How can I calculate the average and filter the records in one query?

Solution

The AVG aggregate function calculates the average value of a field across the records included in a query. While that is a straightforward operation, comparing the value in each record to the average presents a challenge. One way to do this is to use a subquery. A subquery is literally a query within a query, typically located within the WHERE section of the main query.

Figure 1-29 shows a table of teams and their scores for the season. The task is to identify which teams have a season score that is greater than the average of all the scores.

A table of teams and scores

Figure 1-29. A table of teams and scores

A little finesse with SQL is required to identify the teams that beat the average. The AVG aggregate function (see Finding the Sum or Average in a Set of Data)is needed, but it is not applied in the typical way in the query grid. unmatched_Figure 1-30 shows how the query is entered into the query grid. Select View → Totals while designing the query to display the Total row in the grid. Then, create an additional column in the grid based on the SeasonScore field. Don't select AVG in the Total row; instead, select Where from the drop-down list, and enter the subquery in the Criteria row.

A subquery design in the query grid

Figure 1-30. A subquery design in the query grid

In this example, the greater-than sign (>) precedes the subquery, since we are looking for scores that are greater than the average. The AVG function appears in the subquery itself, which has the following syntax:

	Select AVG(SeasonScore) From SeasonScores

Here's the full SQL statement:

	SELECT Team, SeasonScore
	FROM SeasonScores
	WHERE (((SeasonScore)>
	(Select AVG(SeasonScore) From SeasonScores)))
	GROUP BY Team, SeasonScore
	ORDER BY Team;

Discussion

In the preceding example, the subquery resides in the WHERE section of the outer query. An alternative is to have the subquery act as one of the fields in the outer query's SELECT section. Figure 1-31 shows two tables and a query. On the left is the SeasonScores table presented earlier. On the right is a related child table that lists game dates and locations for each team in the first table. The query, whose result is shown underneath the tables, has returned the date of the last game played by each team.

A subquery that queries a second table

Figure 1-31. A subquery that queries a second table

Here is the SQL statement of the Last Game Played Per Team query in Figure 1-31:

	SELECT SeasonScores.Team,
	(Select Max(Date) From Games Where
	Games.TeamID = SeasonScores.TeamID)
	AS [Date Of Last Game]
	FROM SeasonScores;

The subquery is placed where a field would typically go. It is encased in parentheses and is given an alias for the field name (Date Of Last Game) outside of the subquery. Within the subquery is the join between the two tables.

Tip

There are other ways to return the same information, using different query constructs—for example, the tables could be joined and a Max of Date could be used. Working it in as a subquery serves here to illustrate a new method.

Removing excessive queries

One of the advantages of using subqueries is the avoidance of nested queries. Let's look at an example. Say you have two tables: tblCustomers, which contains customer names; and tblCustomerReachOut, which contains the dates at which customers were contacted. The CustomerID field is present in both tables, as shown in Figure 1-32. Now, consider the task of gathering a list of customers who have not been contacted in more than 60 days.

A table of customers and a table of contact dates

Figure 1-32. A table of customers and a table of contact dates

Let's look at a common approach to extracting the desired information. First, a query is constructed that returns—from the table of contact dates—records in which the last contact date is more than 60 days from the present date, as shown in Figure 1-33.

A query that returns records based on elapsed time

Figure 1-33. A query that returns records based on elapsed time

The query in Figure 1-33 is then queried from another query. This next query effectively takes the records from the first query and matches them with customer names from the tblCustomers table. This second query is shown in Figure 1-34.

A query that uses the returned records of another query

Figure 1-34. A query that uses the returned records of another query

To summarize, two queries have been constructed, though one would do the trick. When this inefficient design is perpetuated repeatedly in a database, the database can become laden with dozens of queries that seem isolated on the surface. When viewing a list of all the queries in a database (on the Queries tab), there is no immediate method to know which queries are called by others. Nested queries do work, but they're often unnecessary.

In contrast to the approach just described, here is the SQL of a query/subquery construction that returns the same records:

	SELECTDISTINCT tblCustomers.CustomerFirstName,
	tblCustomers.CustomerLastName, tblCustomers.CustomerAddr1
	FROM tblCustomers
	WHERE (((tblCustomers.CustomerID)
	In (Select tblCustomerReachOut.CustomerID FROM tblCustomerReachOut
	WHERE DateDiff("d",[ContactDate],Now())>60)));

This SQL statement uses a subquery to return the CustomerIDs from the tblCustomerReachOut table for customers who were last contacted more than 60 days earlier (the DateDiff function is used to calculate the elapsed time). The returned CustomerIDs are matched with related records in the tblCustomers table, thereby returning the customer names. Note that the SQL statement uses the DISTINCT predicate (discussed in Returning Distinct Records), as the returned records from the tblCustomerReachOut table can include duplicate CustomerIDs. This makes sense because customers are likely to be contacted more than once. Using the DISTINCT predicate ensures that the final returned list of names will not contain duplicates.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required