Wrong Results from a SELECT

This is another frequent problem reported by users of an application who don’t see the updates they made, see them in the wrong order, or see something they don’t expect.

There are two main reasons for getting wrong results: something is wrong with your SELECT query, or the data in database differs from what you expect. I’ll start with the first case.

When I went over examples for this section, I had to either show some real-life examples or write my own toy cases. The real-life examples can be overwhelmingly large, but the toy cases wouldn’t be much help to you, because nobody writes such code. So I’ve chosen to use some typical real-life examples, but simplified them dramatically.

The first example involves a common user mistake when using huge joins. We will use Example 1-1, described in the previous section. This table contains my collection of MySQL features that cause common usage mistakes I deal with in MySQL Support. Each mistake has a row in the items table. I have another table of links to resources for information. Because there’s a many-to-many relationship between items and links, I also maintain an items_links table to tie them together. Here are the definitions of the items and items_links table (we don’t need links in this example):

mysql> DESC items;
+-------------------+--------------+------+-----+---------+----------------+
| Field             | Type         | Null | Key | Default | Extra          |
+-------------------+--------------+------+-----+---------+----------------+
| id                | int(11)      | NO   | PRI | NULL    | auto_increment |
| short_description | varchar(255) | YES  |     | NULL    |                |
| description       | text         | YES  |     | NULL    |                |
| example           | text         | YES  |     | NULL    |                |
| explanation       | text         | YES  |     | NULL    |                |
| additional        | text         | YES  |     | NULL    |                |
+-------------------+--------------+------+-----+---------+----------------+
6 rows in set (0.30 sec)

mysql> DESC items_links;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| iid    | int(11) | YES  | MUL | NULL    |       |
| linkid | int(11) | YES  | MUL | NULL    |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.11 sec)

The first query I wrote worked fine and returned a reasonable result:

mysql> SELECT count(*) FROM items WHERE id IN (SELECT id FROM items_links);
+----------+
| count(*) |
+----------+
|       10 |
+----------+
1 row in set (0.12 sec)

...until I compared the number returned with the total number of links:

mysql> SELECT count(*) FROM items_links;
+----------+
| count(*) |
+----------+
|        6 |
+----------+
1 row in set (0.09 sec)

How could it be possible to have more links than associations?

Let’s examine the query, which I made specially for this book, once more. It is simple and contains only two parts, a subquery:

SELECT id FROM items_links

and an outer query:

SELECT count(*) FROM items WHERE id IN ...

The subquery can be a good place to start troubleshooting because one should be able to execute it independently. Therefore, we can expect a compete result set:

mysql> SELECT id FROM items_links;
ERROR 1054 (42S22): Unknown column 'id' in 'field list'

Surprise! We have a typo, and actually there is no field named id in the items_links table; it says iid (for “items ID”) instead. If we rewrite our query so that it uses the correct identifiers, it will work properly:

mysql> SELECT count(*) FROM items WHERE id IN (SELECT iid FROM items_links);
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.08 sec)
  • We just learned a new debugging technique. If a SELECT query does not work as expected, split it into smaller chunks, and then analyze each part until you find the cause of the incorrect behavior.

Note

If you specify the full column name by using the format table_name.column_name, you can prevent the problems described here in the first place because you will get an error immediately:

mysql> SELECT count(*) FROM items WHERE items.id IN 
       (SELECT items_links.id FROM items_links); 
ERROR 1054 (42S22): Unknown column 'items_links.id' in 'field list'

A good tool for testing is the simple MySQL command-line client that comes with a MySQL installation. We will discuss the importance of this tool in Chapter 6.

But why didn’t MySQL return the same error for the first query? We have a field named id in the items table, so MySQL thought we wanted to run a dependent subquery that actually selects items.id from items_links. A “dependent subquery” is one that refers to fields from the outer query.

We can also use EXPLAIN EXTENDED followed by SHOW WARNINGS to find the mistake. If we run these commands on the original query, we get:

mysql> EXPLAIN EXTENDED SELECT count(*) FROM items WHERE id IN 
(SELECT id FROM items_links)\G
2 rows in set, 2 warnings (0.12 sec)
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: items
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: items_links
         type: index
possible_keys: NULL
          key: iid_2
      key_len: 5
          ref: NULL
         rows: 6
     filtered: 100.00
        Extra: Using where; Using index
2 rows in set, 2 warnings (0.54 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1276
Message: Field or reference 'collaborate2011.items.id' of SELECT #2 was resolved
in SELECT #1
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: select count(0) AS `count(*)` from `collaborate2011`.`items` where
<in_optimizer7gt;(`collaborate2011`.`items`.`id`,<exists>(select 1 from
`collaborate2011`.`items_links` where
(<cache>(`collaborate2011`.`items`.`id`) =
`collaborate2011`.`items`.`id`)))
2 rows in set (0.00 sec)

Row 2 of the EXPLAIN EXTENDED output shows that the subquery is actually dependent: select_type is DEPENDENT SUBQUERY.

Before moving on from this example, I want to show one more technique that will help you avoid getting lost when your query contains lots of table references. It is easy to get lost if you join 10 or more tables in a single query, even when you know how they should be joined.

The interesting part of the previous example was the output of SHOW WARNINGS. The MySQL server does not always execute a query as it was typed, but invokes the optimizer to create a better execution plan so that the user usually gets the results faster. Following EXPLAIN EXTENDED, the SHOW WARNINGS command shows the optimized query.

In our example, the SHOW WARNINGS output contains two notes. The first is:

Field or reference 'collaborate2011.items.id' of SELECT #2 was resolved in SELECT #1

This note clearly shows that the server resolved the value of id in the subquery from the items table rather than from items_links.

The second note contains the optimized query:

select count(0) AS `count(*)` from `collaborate2011`.`items` where <in_optimizer>
(`collaborate2011`.`items`.`id`,<exists>
(select 1 from `collaborate2011`.`items_links` where
(<cache>(`collaborate2011`.`items`.`id`) = `collaborate2011`.`items`.`id`)))

This output also shows that the server takes the value of id from the items table.

Now let’s compare the previous listing with the result of EXPLAIN EXTENDED on the correct query:

mysql> EXPLAIN EXTENDED SELECt count(*) FROM items WHERE id IN
(SELECT iid FROM items_links)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: items
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: items_links
         type: index_subquery
possible_keys: iid,iid_2
          key: iid
      key_len: 5
          ref: func
         rows: 1
     filtered: 100.00
        Extra: Using index; Using where
2 rows in set, 1 warning (0.03 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select count(0) AS `count(*)` from `collaborate2011`.`items` where
<in_optimizer>(`collaborate2011`.`items`.`id`,<exists>
(<index_lookup>(<cache>(`collaborate2011`.`items`.`id`) in
items_links on iid where (<cache>(`collaborate2011`.`items`.`id`) =
`collaborate2011`.`items_links`.`iid`))))
1 row in set (0.00 sec)

The optimized query this time looks completely different, and really compares items.id with items_links.iid as we intended.

  • We just learned another lesson: use EXPLAIN EXTENDED followed by SHOW WARNINGS to find how a query was optimized (and executed).

The value of select_type in the correct query is still DEPENDENT SUBQUERY. How can that be if we resolve the field name from the items_links table? The explanation starts with the part of the SHOW WARNINGS output that reads as follows:

where (<cache>(`collaborate2011`.`items`.`id`) = 
`collaborate2011`.`items_links`.`iid`)

The subquery is still dependent because the id in clause of the outer query requires the subquery to check its rows against the value of iid in the inner query. This issue came up in the discussion of report #12106 in the MySQL Community Bugs Database.

  • I added a link to the bug report because it provides another important lesson: if you doubt the behavior of your query, use good sources to find information. The community bug database is one such source.

There can be many different reasons why a SELECT query behaves incorrectly, but the general method of investigation is always the same:

  • Split the query into small chunks, and then execute them one by one until you see the cause of the problem.

  • Use EXPLAIN EXTENDED followed by SHOW WARNINGS to get the query execution plan and information on how it was actually executed.

  • If you don’t understand the MySQL server behavior, use the Internet and good sources for information. The Appendix A includes a list of useful resources.

Get MySQL Troubleshooting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.