## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# Fine-Tuning Data Filtering with Subqueries

## Problem

I need to determine which records in a table have above-average values for a particular quantitative field. How can I calculate the average and filter the records in one query?

## Solution

The `AVG` aggregate function calculates the average value of a field across the records included in a query. While that is a straightforward operation, comparing the value in each record to the average presents a challenge. One way to do this is to use a subquery. A subquery is literally a query within a query, typically located within the `WHERE` section of the main query.

Figure 1-29 shows a table of teams and their scores for the season. The task is to identify which teams have a season score that is greater than the average of all the scores.

A little finesse with SQL is required to identify the teams that beat the average. The `AVG` aggregate function (see Finding the Sum or Average in a Set of Data)is needed, but it is not applied in the typical way in the query grid. unmatched_Figure 1-30 shows how the query is entered into the query grid. Select View → Totals while designing the query to display the Total row in the grid. Then, create an additional column in the grid based on the SeasonScore field. Don't select AVG in the Total row; instead, select Where from the drop-down list, and enter the subquery in the Criteria row.

In this example, the greater-than sign (>) precedes the subquery, since we are looking for scores that are greater than the average. The `AVG` function appears in the subquery itself, which has the following syntax:

`	Select AVG(SeasonScore) From SeasonScores`

Here's the full SQL statement:

```	SELECT Team, SeasonScore
FROM SeasonScores
WHERE (((SeasonScore)>
(Select AVG(SeasonScore) From SeasonScores)))
GROUP BY Team, SeasonScore
ORDER BY Team;```

## Discussion

In the preceding example, the subquery resides in the `WHERE` section of the outer query. An alternative is to have the subquery act as one of the fields in the outer query's `SELECT` section. Figure 1-31 shows two tables and a query. On the left is the SeasonScores table presented earlier. On the right is a related child table that lists game dates and locations for each team in the first table. The query, whose result is shown underneath the tables, has returned the date of the last game played by each team.

Here is the SQL statement of the Last Game Played Per Team query in Figure 1-31:

```	SELECT SeasonScores.Team,
(Select Max(Date) From Games Where
Games.TeamID = SeasonScores.TeamID)
AS [Date Of Last Game]
FROM SeasonScores;```

The subquery is placed where a field would typically go. It is encased in parentheses and is given an alias for the field name (Date Of Last Game) outside of the subquery. Within the subquery is the join between the two tables.

### Tip

There are other ways to return the same information, using different query constructs—for example, the tables could be joined and a `Max` of `Date` could be used. Working it in as a subquery serves here to illustrate a new method.

### Removing excessive queries

One of the advantages of using subqueries is the avoidance of nested queries. Let's look at an example. Say you have two tables: tblCustomers, which contains customer names; and tblCustomerReachOut, which contains the dates at which customers were contacted. The CustomerID field is present in both tables, as shown in Figure 1-32. Now, consider the task of gathering a list of customers who have not been contacted in more than 60 days.

Let's look at a common approach to extracting the desired information. First, a query is constructed that returns—from the table of contact dates—records in which the last contact date is more than 60 days from the present date, as shown in Figure 1-33.

The query in Figure 1-33 is then queried from another query. This next query effectively takes the records from the first query and matches them with customer names from the tblCustomers table. This second query is shown in Figure 1-34.

To summarize, two queries have been constructed, though one would do the trick. When this inefficient design is perpetuated repeatedly in a database, the database can become laden with dozens of queries that seem isolated on the surface. When viewing a list of all the queries in a database (on the Queries tab), there is no immediate method to know which queries are called by others. Nested queries do work, but they're often unnecessary.

In contrast to the approach just described, here is the SQL of a query/subquery construction that returns the same records:

```	SELECTDISTINCT tblCustomers.CustomerFirstName,
This SQL statement uses a subquery to return the CustomerIDs from the tblCustomerReachOut table for customers who were last contacted more than 60 days earlier (the `DateDiff` function is used to calculate the elapsed time). The returned CustomerIDs are matched with related records in the tblCustomers table, thereby returning the customer names. Note that the SQL statement uses the `DISTINCT` predicate (discussed in Returning Distinct Records), as the returned records from the tblCustomerReachOut table can include duplicate CustomerIDs. This makes sense because customers are likely to be contacted more than once. Using the `DISTINCT` predicate ensures that the final returned list of names will not contain duplicates.