When the Server Does Not Answer

Occasionally a MySQL client gets the catastrophic message "Lost connection to server during query" or "Server has gone away." Although I hope you will never face this problem, it’s good to be prepared in case it happens. There are two main reasons for this problem caused by the MySQL installation itself: server issues (most likely a crash) or the misuse of connection options (usually timeout options or max_allowed_packet).

We will discuss the configuration of connections in Chapter 3. Problems caused by hardware and third-party software will be touched on in Chapter 4. Here I want to briefly describe what to do if the server crashes.

First, determine whether you really had a server crash. You can do this with the help of process status monitors. If you run mysqld_safe or another daemon that restarts the server after a failure, the error log will contain a message indicating the server has been restarted. When mysqld starts, it always prints something like this to the error logfile:

110716 14:01:44 [Note] /apps/mysql-5.1/libexec/mysqld: ready for connections.
Version: '5.1.59-debug'  socket: '/tmp/mysql51.sock'  port: 3351  Source distribution

So if you find such a message, the server has been restarted. If there is no message and the server is up and running, a lost connection is most likely caused by the misuse of connection options, which will be discussed in Chapter 3.

Note

If you remember when your MySQL server was originally started, you can use the status variable uptime, which shows the number of seconds since the server started:

mysql> SHOW GLOBAL STATUS LIKE 'uptime';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Uptime        | 10447 |
+---------------+-------+
1 row in set (0.00 sec)

This information will also help when you want to check whether mysqld failure was not caused by an operating system restart. Just compare the value of this variable with the operating system uptime.

The reason why I rely on error logfiles comes from my job experience with cases when customers notice a problem hours after a server crash and even some time after a planned mysqld restart.

If you confirm that the server has restarted, you should examine the error log again and look for information about the crash itself. Usually you can derive enough information about the crash from the error log to avoid the same situation in the future. We’ll discuss how to investigate the small number of difficult cases you may encounter in Chapter 6. Now let’s go back to the error logfile and see examples of its typical contents in case of a server crash. I’ll list a large extract here:

Version: '5.1.39' socket: '/tmp/mysql_sandbox5139.sock' port: 5139 
MySQL Community Server (GPL)
091002 14:56:54 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=1
max_threads=151
threads_connected=1

It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338301 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
thd: 0x69e1b00
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x450890f0 thread_stack 0x40000
/users/ssmirnova/blade12/5.1.39/bin/mysqld(my_print_stacktrace+0x2e)[0x8ac81e]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(handle_segfault+0x322)[0x5df502]
/lib64/libpthread.so.0[0x3429e0dd40]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN6String4copyERKS_+0x16)[0x5d9876]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_cache_str5storeEP4Item+0xc9)
[0x52ddd9]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN26select_singlerow_subselect9send_
dataER4ListI4ItemE+0x45)
[0x5ca145]
/users/ssmirnova/blade12/5.1.39/bin/mysqld[0x6386d1]
/users/ssmirnova/blade12/5.1.39/bin/mysqld[0x64236a]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN4JOIN4execEv+0x949)[0x658869]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN30subselect_single_select_
engine4execEv+0x36c)[0x596f3c]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_subselect4execEv+0x26)[0x595d96]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN24Item_singlerow_subselect8val_
realEv+0xd)[0x595fbd]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Arg_comparator18compare_real_
fixedEv+0x39)[0x561b89]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN12Item_func_ne7val_intEv+0x23)[0x568fb3]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN4JOIN8optimizeEv+0x12ef)[0x65208f]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_
LISTjR4ListIS1_ES2_jP8
st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xa0)
[0x654850]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z13handle_selectP3THDP6st_lexP13select_
resultm+0x16c)
[0x65a1cc]
/users/ssmirnova/blade12/5.1.39/bin/mysqld[0x5ecbda]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z21mysql_execute_commandP3THD+0x602)
[0x5efdd2]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x357)
[0x5f52f7]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z16dispatch_command19enum_server_
commandP3THDPcj+0xe93)
[0x5f6193]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5f6a56]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(handle_one_connection+0x246)[0x5e93f6]
/lib64/libpthread.so.0[0x3429e061b5]
/lib64/libc.so.6(clone+0x6d)[0x34292cd39d]`)
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x6a39e60 = select 1 from `t1` where `c0` <>
(select geometrycollectionfromwkb(`c3`) from `t1`)
thd->thread_id=2
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what

The key line indicating the reason for the crash is:

091002 14:56:54 - mysqld got signal 11 ;

This means the MySQL server was killed after it asked the operating system for a resource (e.g., access to a file or RAM), getting an error with code 11. In most operating systems, this signal refers to a segmentation fault. You can find more detailed information in the user manual for your operating system. Run man signal for Unix and Linux. In Windows, similar cases will usually generate a log message like “mysqld got exception 0xc0000005.” Search the Windows user manual for the meanings of these exception codes.

The following is the excerpt from the log about a query that was running in the thread that crashed the server:

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x6a39e60 = SELECT 1 FROM `t1` WHERE `c0` <>
(SELECT geometrycollectionfromwkb(`c3`) FROM `t1`)
thd->thread_id=2
thd->killed=NOT_KILLED

To diagnose, rerun the query to see whether it was the cause of the crash:

mysql> SELECT 1 FROM `t1` WHERE `c0` <> (SELECT
geometrycollectionfromwkb(`c3`) FROM `t1`);
ERROR 2013 (HY000): Lost connection to MySQL server during query

Note

When I recommend repeating problems, I assume you will use the development server and not your production server. We will discuss how to safely troubleshoot in an environment dedicated to this purpose in Sandboxes. Please don’t try to repeat this example; it is based on a known bug #47780 that’s fixed in current versions. The fix exists since versions 5.0.88, 5.1.41, 5.5.0, and 6.0.14.

At this point, you have identified the source of the crash and confirmed this was really the source, but you have to rewrite the query so it does not cause a crash next time. Now we can get help from the backtrace that was printed in the log:

Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x450890f0 thread_stack 0x40000
/users/ssmirnova/blade12/5.1.39/bin/mysqld(my_print_stacktrace+0x2e)[0x8ac81e]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(handle_segfault+0x322)[0x5df502]
/lib64/libpthread.so.0[0x3429e0dd40]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN6String4copyERKS_+0x16)[0x5d9876]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_cache_str5storeEP4Item+0xc9)
[0x52ddd9]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN26select_singlerow_subselect9
send_dataER4ListI4ItemE+0x45)
[0x5ca145]
/users/ssmirnova/blade12/5.1.39/bin/mysqld[0x6386d1]
/users/ssmirnova/blade12/5.1.39/bin/mysqld[0x64236a]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN4JOIN4execEv+0x949)[0x658869]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN30subselect_single_select_engine4execEv
+0x36c)[0x596f3c]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_subselect4execEv+0x26)[0x595d96]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN24Item_singlerow_subselect8val_realEv
+0xd)[0x595fbd]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Arg_comparator18compare_real_fixedEv
+0x39)[0x561b89]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN12Item_func_ne7val_intEv+0x23)[0x568fb3]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN4JOIN8optimizeEv+0x12ef)[0x65208f]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_
LISTjR4ListIS1_ES2_jP8
st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xa0)
[0x654850]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z13handle_selectP3THDP6st_lexP13select_
resultm+0x16c)
[0x65a1cc]
/users/ssmirnova/blade12/5.1.39/bin/mysqld[0x5ecbda]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z21mysql_execute_commandP3THD+0x602)
[0x5efdd2]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x357)
[0x5f52f7]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z16dispatch_command19enum_server_
commandP3THDPcj+0xe93)
[0x5f6193]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5f6a56]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(handle_one_connection+0x246)[0x5e93f6]
/lib64/libpthread.so.0[0x3429e061b5]
/lib64/libc.so.6(clone+0x6d)[0x34292cd39d]`)

The relevant lines are the calls to Item_subselect and Item_singlerow_subselect:

/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_subselect4execEv+0x26)[0x595d96]
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN24Item_singlerow_subselect8val_realEv
+0xd)[0x595fbd]

How did I decide these were the culprits? In this case, I recognized the calls from my previous troubleshooting. But a good rule of thumb is to start from the top. These first functions are usually operating system calls, which can be relevant, but are of no help in these circumstances, because you cannot do anything with them, and then follow calls to the MySQL library. Examine these from top to bottom to find which ones you can affect. You can’t do anything with String4copy or Item_cache_str5store, for instance, but you can rewrite a subselect, so we’ll start from there.

Here, even without looking into the source code for mysqld, we can play around to find the cause of the crash. It’s a good guess that the use of a subquery is the problem because subqueries can be converted easily to JOIN. Let’s try rewriting the query and testing it:

mysql> SELECT 1 FROM `t1` WHERE `c0` <> geometrycollectionfromwkb(`c3`);
Empty set (0.00 sec)

The new query does not crash, so all you need to do is change the query in the application to match.

  • You just learned something important about MySQL troubleshooting: the first thing to check in case of an unknown error is the error logfile. Always have it turned on.

Here I want to add a note about bugs. When you are faced with a crash and identify the reason, check the MySQL bug database for similar problems. When you find a bug that looks like the one you hit, check whether it has been fixed, and if so, upgrade your server to the version where the bug was fixed (or a newer one). This can save you time because you won’t have to fix the problematic query.

If you can’t find a bug similar to one you hit, try downloading the latest version of the MySQL server and running your query. If the bug is reproducible there, report it. It’s important to use the latest versions of stable general availability (GA) releases because they contain all current bug fixes and many old issues won’t reappear. Chapter 6 discusses how to safely test crashes in sandbox environments.

Crashes can be caused not only by particular queries, but also by the environment in which the server runs. The most common reason is the lack of free RAM. This happens particularly when the user allocates huge buffers. As I noted before, mysqld always needs slightly more RAM than the sum of all buffers. Usually the error logfile contains a rough estimation of the RAM that can be used. It looks like this:

key_buffer_size=235929600
read_buffer_size=4190208
max_used_connections=17
max_connections=2048
threads_connected=13
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 21193712 K
-----
21193712K ~= 20G

Such estimations are not precise, but still worth checking. The one just shown claims that mysqld can use up to 20G RAM! You can get powerful boxes nowadays, but it is worth checking whether you really have 20G RAM.

Another issue in the environment is other applications that run along with the MySQL server. It is always a good idea to dedicate a server for MySQL in production because other applications can use resources that you expect MySQL to use. We will describe how to debug the effects of other applications on mysqld in Chapter 4.

Get MySQL Troubleshooting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.