The GROUP BY Clause

The GROUP BY clause, along with the aggregate functions, groups a result set into multiple groups, and then produces a single row of summary information for each group. For example, if we want to find the total number of orders for each customer, execute the following query:

            SELECT CUST_NBR, COUNT(ORDER_NBR) 
            FROM CUST_ORDER 
            GROUP BY CUST_NBR;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
       201                2
       231                6
       244                2
       255                6
       264                2
       288                2

6 rows selected.

The query produces one summary line of output for each customer. This is the essence of a GROUP BY query. We asked Oracle to GROUP the results BY CUST_NBR; therefore, it produced one output row for each distinct value of CUST_NBR. Each data value for a given customer represents a summary based on all rows for that customer.

The nonaggregate expression CUST_NBR in the SELECT list also appears in the GROUP BY clause. If we have a mix of aggregate and nonaggregate expressions in the SELECT list, SQL expects that we are trying to perform a GROUP BY operation, and we must also specify all nonaggregate expressions in the GROUP BY clause. SQL returns an error if we fail to do so. For example, if we omit the GROUP BY clause, the following error is returned:

            SELECT CUST_NBR, SALES_EMP_ID, COUNT(ORDER_NBR)
            FROM CUST_ORDER;
SELECT CUST_NBR, SALES_EMP_ID, COUNT(ORDER_NBR)
       *
ERROR at line 1:
ORA-00937: not a single-group group function

Similarly, if we forget to include all nonaggregate expressions from the SELECT list in the GROUP BY clause, SQL returns the following error:

            SELECT CUST_NBR, SALES_EMP_ID, COUNT(ORDER_NBR)
            FROM CUST_ORDER
            GROUP BY CUST_NBR;
SELECT CUST_NBR, SALES_EMP_ID, COUNT(ORDER_NBR)
                 *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

Finally, we can’t use a group function (aggregate function) in the GROUP BY clause. We will get an error if we attempt to do so, as in the following example:

            SELECT CUST_NBR, COUNT(ORDER_NBR)
            FROM CUST_ORDER
            GROUP BY CUST_NBR, COUNT(ORDER_NBR);
GROUP BY CUST_NBR, COUNT(ORDER_NBR)
                   *
ERROR at line 3:
ORA-00934: group function is not allowed here

If we have a constant in our SELECT list, we don’t need to include it in the GROUP BY clause. However, including the constant in the GROUP BY clause doesn’t alter the result. Therefore, both the following statements will produce the same output:

            SELECT 'CUSTOMER', CUST_NBR, COUNT(ORDER_NBR) 
            FROM CUST_ORDER 
            GROUP BY CUST_NBR;

            SELECT 'CUSTOMER', CUST_NBR, COUNT(ORDER_NBR) 
            FROM CUST_ORDER
            GROUP BY 'CUSTOMER', CUST_NBR;

'CUSTOME   CUST_NBR COUNT(ORDER_NBR)
-------- ---------- ----------------
CUSTOMER        201                2
CUSTOMER        231                6
CUSTOMER        244                2
CUSTOMER        255                6
CUSTOMER        264                2
CUSTOMER        288                2

6 rows selected.

There are certain situations when we want an expression in the select list, but don’t want to group by the same. For example, we might want to display a line number along with the summary information for each customer. Attempt to do so using the following query, and we will get an error:

            SELECT ROWNUM, CUST_NBR, COUNT(ORDER_NBR)
            FROM CUST_ORDER
            GROUP BY CUST_NBR;
SELECT ROWNUM, CUST_NBR, COUNT(ORDER_NBR)
       *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

If we include ROWNUM in the GROUP BY clause, we’ll get the following, unexpected result:

            SELECT ROWNUM, CUST_NBR, COUNT(ORDER_NBR) 
            FROM CUST_ORDER
            GROUP BY ROWNUM, CUST_NBR;

    ROWNUM   CUST_NBR COUNT(ORDER_NBR)
---------- ---------- ----------------
         1        231                1
         2        201                1
         3        255                1
         4        264                1
         5        244                1
         6        288                1
         7        231                1
         8        255                1
         9        255                1
        10        231                1
        11        231                1
        12        201                1
        13        255                1
        14        264                1
        15        244                1
        16        288                1
        17        231                1
        18        255                1
        19        255                1
        20        231                1

20 rows selected.

We certainly didn’t want this result, did we? We wanted to receive one summary row for each customer, and then to display ROWNUM for those lines. But when we include ROWNUM in the GROUP BY clause, it produces one summary row for each row selected from the table CUST_ORDER. To get the expected result, we should use the following SQL:

            SELECT ROWNUM, V.*
            FROM (SELECT CUST_NBR, COUNT(ORDER_NBR) 
                  FROM CUST_ORDER GROUP BY CUST_NBR) V;

    ROWNUM   CUST_NBR COUNT(ORDER_NBR)
---------- ---------- ----------------
         1        201                2
         2        231                6
         3        244                2
         4        255                6
         5        264                2
         6        288                2

6 rows selected.

The construct in the FROM clause is called an inline view. Read more about inline views in Chapter 5.

Syntactically, it is not mandatory to include all the expressions of the GROUP BY clause in the SELECT list. However, those expressions not in the SELECT list will not be represented in the output; therefore, the output may not make much sense. For example:

            SELECT COUNT(ORDER_NBR) 
            FROM CUST_ORDER 
            GROUP BY CUST_NBR;

COUNT(ORDER_NBR)
----------------
               2
               6
               2
               6
               2
               2

6 rows selected.

This query produces a count of orders for each customer (by grouping based on CUST_NBR), but without the CUST_NBR in the output we can’t associate the counts with the customers. Extending the previous example, we can see that without a consistent SELECT list and GROUP BY clause, the output may be a bit confusing. The following example produces output that at first glance seems useful:

            SELECT CUST_NBR, COUNT(ORDER_NBR) 
            FROM CUST_ORDER 
            GROUP BY CUST_NBR, ORDER_DT;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
       201                2
       231                2
       231                4
       244                2
       255                2
       255                2
       255                2
       264                2
       288                2

9 rows selected.

From the output, it appears that we are trying to obtain a count of orders for each customer. However, there are multiple rows in the output for some CUST_NBR values. The fact that we have included ORDER_DT in the GROUP BY clause, and therefore generated a summary result for each combination of CUST_NBR and ORDER_DT, is missing from the output. We can’t make sense of the output unless the output and the SQL statement are looked at together. We can’t expect all readers of SQL output to understand SQL syntax, can we? Therefore, we always recommend maintaining consistency between the nonaggregate expressions in the SELECT list and the expressions in the GROUP BY clause. A more meaningful version of the previous SQL statement would be as follows:

            SELECT CUST_NBR, ORDER_DT, COUNT(ORDER_NBR) 
            FROM CUST_ORDER
            GROUP BY CUST_NBR, ORDER_DT;

  CUST_NBR ORDER_DT  COUNT(ORDER_NBR)
---------- --------- ----------------
       201 19-JUL-01                2
       231 18-JUL-01                2
       231 22-JUL-01                4
       244 18-JUL-01                2
       255 12-JUL-01                2
       255 20-JUL-01                2
       255 21-JUL-01                2
       264 16-JUL-01                2
       288 22-JUL-01                2

9 rows selected.

This output is consistent with the GROUP BY clause in the query. We’re more likely to make the correct assumption about what this output represents.

GROUP BY Clause and NULL Values

When we GROUP BY a column that contains NULL values for some rows, all the rows with NULL values are placed into a single group and presented as one summary row in the output. For example:

               SELECT SALE_PRICE, COUNT(ORDER_NBR) 
               FROM CUST_ORDER 
               GROUP BY SALE_PRICE;

SALE_PRICE COUNT(ORDER_NBR)
---------- ----------------
        25                4
        34                2
        56                4
        99                4
                          6

Notice that the last row in the output consists of a NULL value for the column SALE_PRICE. Since the GROUP BY clause inherently performs an ORDER BY on the group by columns, the row containing the NULL value is put at the end. If we want this row to be the first row in the output, we can perform an ORDER BY on SALE_PRICE in descending order:

               SELECT SALE_PRICE, COUNT(ORDER_NBR) 
               FROM CUST_ORDER 
               GROUP BY SALE_PRICE
               ORDER BY SALE_PRICE DESC;

SALE_PRICE COUNT(ORDER_NBR)
---------- ----------------
                          6
        99                4
        56                4
        34                2
        25                4

GROUP BY Clause with WHERE Clause

While producing summary results using the GROUP BY clause, we can filter records from the table based on a WHERE clause, as in the following example, which produces a count of orders in which the sale price exceeds $25.00 for each customer:

               SELECT CUST_NBR, COUNT(ORDER_NBR)
               FROM CUST_ORDER
               WHERE SALE_PRICE > 25
               GROUP BY CUST_NBR;

  CUST_NBR COUNT(ORDER_NBR)
---------- ----------------
       231                4
       244                2
       264                2
       288                2

While executing a SQL statement with a WHERE clause and a GROUP BY clause, Oracle first applies the WHERE clause and filters out the rows that don’t satisfy the WHERE condition. The rows that satisfy the WHERE clause are then grouped using the GROUP BY clause.

The SQL syntax requires that the WHERE clause must come before the GROUP BY clause. Otherwise, the following error is returned:

               SELECT CUST_NBR, COUNT(ORDER_NBR)
               FROM CUST_ORDER
               GROUP BY CUST_NBR
               WHERE SALE_PRICE > 25;
WHERE SALE_PRICE > 25
*
ERROR at line 4:
ORA-00933: SQL command not properly ended

Get Mastering Oracle SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.