Replication in MySQL

This section describes the various replication features in MySQL. It serves as a reference to the options available with replication. You will be introduced to replication and learn how to implement it. Toward the end, there are some frequently asked questions and descriptions of problems and how to solve them.

We suggest that you visit our web site at http://www.mysql.com/ often and read updates to this section. Replication is constantly being improved, and we update the manual frequently with the most current information.

Introduction

One way replication can be used is to increase both robustness and speed. For robustness you can have two systems and can switch to the backup if you have problems with the master. The extra speed is achieved by sending a part of the non-updating queries to the replica server. Of course, this only works if non-updating queries dominate, but that is the normal case.

Starting in Version 3.23.15, MySQL supports one-way replication internally. One server acts as the master, while the other acts as the slave. Note that one server could play the roles of master in one pair and slave in the other. The master server keeps a binary log of updates (see Section 4.9.4) and an index file to binary logs to keep track of log rotation. The slave, upon connecting, informs the master where it left off since the last successfully propagated update, catches up on the updates, and then blocks and waits for the master to notify it of the new updates.

Note that if you are replicating a database, all updates to this database should be done through the master!

Another benefit of using replication is that one can get live backups of the system by doing a backup on a slave instead of doing it on the master. See Section 4.4.1.

Replication Implementation Overview

MySQL replication is based on the server keeping track of all changes to your database (updates, deletes, etc.) in the binary log (see Section 4.9.4) and the slave server(s) reading the saved queries from the master server’s binary log so that the slave can execute the same queries on its copy of the data.

It is very important to realise that the binary log is simply a record starting from a fixed point in time (the moment you enable binary logging). Any slaves that you set up will need copies of all the data from your master as it existed the moment that you enabled binary logging on the master. If you start your slaves with data that doesn’t agree with what was on the master when the binary log was started, your slaves may fail.

Starting in 4.0.0, one can use LOAD DATA FROM MASTER to set up a slave. Note that 4.0.0 slaves cannot communicate with 3.23 masters, but 4.0.1 and later version slaves can. 3.23 slaves cannot talk to 4.0 masters.

You must also be aware that LOAD DATA FROM MASTER currently works only if all the tables on the master are MyISAM type, and will acquire a global read lock, so no writes are possible while the tables are being transferred from the master. This limitation is of a temporary nature, and is due to the fact that we have not yet implement hot lock-free table backup. It will be removed in the future 4.0 branch versions once we implemented hot backup enabling LOAD DATA FROM MASTER to work without blocking master updates.

Due to the aforementioned limitation, we recommend that at this point you use LOAD DATA FROM MASTER only if the dataset on the master is relatively small, or if a prolonged read lock on the master is acceptable. While the actual speed of LOAD DATA FROM MASTER may vary from system to system, a good rule for a rough estimate of how long it is going to take is 1 second per 1M of the data file. You will get close to the estimate if both master and slave are equivalent to 700MHz Pentium, are connected through 100Mit/s network, and your index file is about half the size of your data file. Of course, your mileage will vary from system to system. This rule just gives you a rough order of magnitude estimate.

Once a slave is properly configured and running, it will simply connect to the master and wait for updates to process. If the master goes away or the slave loses connectivity with your master, it will keep trying to connect every master-connect-retry seconds until it is able to reconnect and resume listening for updates.

Each slave keeps track of where it left off. The master server has no knowledge of how many slaves there are or which ones are up-to-date at any given time.

The next section explains the master/slave setup process in more detail.

How to Set Up Replication

Here is a quick description of how to set up complete replication on your current MySQL server. It assumes you want to replicate all your databases and have not configured replication before. You will need to shut down your master server briefly to complete the following steps.

While this method is the most straightforward way to set up a slave, it is not the only one. For example, if you already have a snapshot of the master, and the master already has server id set and binary logging enabled, you can set up a slave without shutting the master down or even blocking the updates. For more details, please see Section 4.10.7.

If you want to become a real MySQL replication guru, we suggest that you begin by studying, pondering, and trying all commands mentioned in Section 4.10.6. You should also familiarize yourself with replication startup options in my.cnf in Section 4.10.5.

Make sure you have a recent version of MySQL installed on the master and slave(s).
Use Version 3.23.29 or higher. Previous releases used a different binary log format and had bugs which have been fixed in newer releases. Please, do not report bugs until you have verified that the problem is present in the latest release.
Set up a special replication user on the master with the FILE privilege and permission to connect from all the slaves. If the user is only doing replication (which is recommended), you don’t need to grant any additional privileges.
For example, to create a usernamed repl which can access your master from any host, you might use this command:
```
mysql> GRANT FILE ON *.* TO repl@"%" IDENTIFIED BY '<password>';
```

Shut down MySQL on the master.

mysqladmin -u root -p<password> shutdown

Snapshot all the data on your master server.
The easiest way to do this (on Unix) is to simply use tar to produce an archive of your entire data directory. The exact data directory location depends on your installation.
```
tar -cvf /tmp/mysql-snapshot.tar /path/to/data-dir
```
Windows users can use WinZIP or similar software to create an archive of the data directory.
In my.cnf on the master add log-bin and server-id=unique number to the [mysqld] section and restart it. It is very important that the id of the slave is different from the id of the master. Think of server-id as something similar to the IP address—it uniquely identifies the server instance in the community of replication partners.
```
[mysqld]
log-bin
server-id=1
```
Restart MySQL on the master.
Add the following to my.cnf on the slave(s):
```
master-host=<hostname of the master>
master-user=<replication username>
master-password=<replication user password>
master-port=<TCP/IP port for master>
server-id=<some unique number between 2 and 2^32-1>
```
replacing the values in <> with what is relevant to your system.
server-id must be different for each server participating in replication. If you don’t specify a server-id, it will be set to 1 if you have not defined master-host. Otherwise, it will be set to 2. Note that in the case of server-id omission the master will refuse connections from all slaves, and the slave will refuse to connect to a master. Thus, omitting server-id is only good for backup with a binary log.
Copy the snapshot data into your data directory on your slave(s). Make sure that the privileges on the files and directories are correct. The user that MySQL runs as needs to be able to read and write to them, just as on the master.
Restart the slave(s).

After you have done this, the slave(s) should connect to the master and catch up on any updates that happened since the snapshot was taken.

If you have forgotten to set server-id for the slave you will get the following error in the error log file:

Warning: one should set server_id to a non-0 value if master_host is set.
The server will not act as a slave.

If you have forgotten to do this for the master, the slaves will not be able to connect to the master.

If a slave is not able to replicate for any reason, you will find error messages in the error log on the slave.

Once a slave is replicating, you will find a file called master.info in the same directory as your error log. The master.info file is used by the slave to keep track of how much of the master’s binary log it has processed. Do not remove or edit the file, unless you really know what you are doing. Even in that case, it is preferred that you use CHANGE MASTER TO command.

Replication Features and Known Problems

The following is an explanation of what is supported and what is not:

Replication will be done correctly with AUTO_INCREMENT, LAST_INSERT_ID( ), and TIMESTAMP values.
RAND( ) in updates does not replicate properly. Use RAND(some_non_rand_expr) if you are replicating updates with RAND( ). You can, for example, use UNIX_TIME- STAMP( ) for the argument to RAND( ).
You have to use the same character set (--default-character-set) on the master and the slave. If not, you may get duplicate key errors on the slave because a key that is regarded as unique on the master may not be that in the other character set.
In 3.23, LOAD DATA INFILE will be handled properly as long as the file still resides on the master server at the time of update propagation. LOAD LOCAL DATA INFILE will be skipped. In 4.0, this limitation is not present—all forms of LOAD DATA INFILE are properly replicated.
Update queries that use user variables are not replication-safe (yet).
FLUSH commands are not stored in the binary log and therefore are not replicated to the slaves. This is not normally a problem, as FLUSH doesn’t change anything. This does, however, mean that if you update the MySQL privilege tables directly without using the the GRANT statement and you replicate the mysql privilege database, you must do a FLUSH PRIVILEGES on your slaves to put the new privileges into effect.
Temporary tables starting in 3.23.29 are replicated properly except when you shut down the slave server (not just the slave thread), you have some temporary tables open, and they are used in subsequent updates. To deal with this problem shutting down the slave, do SLAVE STOP, check Slave_open_temp_tables variable to see if it is 0, then issue mysqladmin shutdown. If the number is not 0, restart the slave thread with SLAVE START and see if you have better luck next time. There will be a cleaner solution, but it has to wait until version 4.0. In earlier versions temporary tables are not replicated properly—we recommend that you either upgrade, or execute SET SQL_LOG_BIN=0 on your clients before all queries with temp tables.
MySQL only supports one master and many slaves. In 4.x, we will add a voting algorithm to automatically change master if something goes wrong with the current master. We will also introduce ‘agent’ processes to help perform load balancing by sending select queries to different slaves.
Starting in Version 3.23.26, it is safe to connect servers in a circular master-slave relationship with log-slave-updates enabled. Note, however, that many queries will not work right in this kind of setup unless your client code is written to take care of the potential problems that can happen from updates that occur in different sequences on different servers.
This means that you can do a setup like the following:
```
A -> B -> C -> A
```
This setup will work only if you do non-conflicting updates between the tables. In other words, if you insert data in A and C, you should never inserted a row in A that may have a conflicting key with a row insert in C. You should also not update the same rows on two servers if the order in which the updates are applied matters.
Note that the log format has changed in Version 3.23.26 so that pre-3.23.26 slaves will not be able to read it.
If the query on the slave gets an error, the slave thread will terminate, and a message will appear in the .err file. You should then connect to the slave manually, fix the cause of the error (for example, non-existent table), and then run the SLAVE START SQL command (available starting in Version 3.23.16). In Version 3.23.15, you will have to restart the server.
If connection to the master is lost, the slave will retry immediately, and then, in case of failure, every master-connect-retry (default 60 seconds). Because of this, it is safe to shut down the master, and then restart it after a while. The slave will also be able to deal with network connectivity outages.
Shutting down the slave (cleanly) is also safe, as it keeps track of where it left off. Unclean shutdowns might produce problems, especially if disk cache was not synced before the system died. Your system fault tolerance will be greatly increased if you have a good UPS.
If the master is listening on a non-standard port, you will also need to specify this with master-port parameter in my.cnf .
In Version 3.23.15, all of the tables and databases will be replicated. Starting in Version 3.23.16, you can restrict replication to a set of databases with replicate-do-db directives in my.cnf or just exclude a set of databases with replicate-ignore-db. Note that up until Version 3.23.23, there was a bug that did not properly deal with LOAD DATA INFILE if you did it in a database that was excluded from replication.
Starting in Version 3.23.16, SET SQL_LOG_BIN = 0 will turn off replication (binary) logging on the master, and SET SQL_LOG_BIN = 1 will turn it back on. You must have the process privilege to do this.
Starting in Version 3.23.19, you can clean up stale replication leftovers when something goes wrong and you want a clean start with FLUSH MASTER and FLUSH SLAVE commands. In Version 3.23.26 we have renamed them to RESET MASTER and RESET SLAVE, respectively, to clarify what they do. The old FLUSH variants still work, though, for compatibility.
Starting in Version 3.23.23, you can change masters and adjust log position with CHANGE MASTER TO.
Starting in Version 3.23.23, you tell the master that updates in certain databases should not be logged to the binary log with binlog-ignore-db.
Starting in Version 3.23.26, you can use replicate-rewrite-db to tell the slave to apply updates from one database on the master to the one with a different name on the slave.
Starting in Version 3.23.28, you can use PURGE MASTER LOGS TO 'log-name' to get rid of old logs while the slave is running.
Due to the non-transactional nature of MyISAM tables, it is possible to have a query that will only partially update a table and return an error code. This can happen, for example, on a multi-row insert that has one row violating a key constraint, or if a long update query is killed after updating some of the rows. If that happens on the master, the slave thread will exit and wait for the DBA to decide what to do about it unless the error code is legitimate and the query execution results in the same error code. If this error code validation behaviour is not desirable, some (or all) errors could be masked out with the slave-skip-errors option starting in Version 3.23.47.
While individual tables can be excluded from replication with replicate-do-table/replicate-ignore-table or replicate-wild-do-table/replicate-wild-ignore-table, there are currently some design deficiencies that in some rather rare cases produce unexpected results. The replication protocol does not inform the slave explicitly which tables are going to be modified by the query—so the slave has to parse the query to know this. To avoid redundant parsing for queries that will end up actually being executed, table exclusion is currently implemented by sending the query to the standard MySQL parser, which will short-circuit the query and report success if it detects that the table should be ignored. In addition to several inefficiencies, this approach is also more bug-prone, and there are two known bugs as of Version 3.23.49—because the parser automatically opens the table when parsing some queries, the ignored table has to exist on the slave. The other bug is that if the ignored table gets partially updated, the slave thread will not notice that the table actually should have been ignored and will suspend the replication process. While these bugs are conceptually very simple to fix, we have not yet found a way to do this without a significant code change that would compromise the stability status of the 3.23 branch. There exists a workaround for both if in the rare case it happens to affect your application—use slave-skip-errors.

Replication Options in `my.cnf`

If you are using replication, we recommend that you use MySQL Version 3.23.30 or later. Older versions work, but they do have some bugs and are missing some features. Some of the options mentioned here may not be available in your version if it is not the most recent one. For all options specific to the the 4.0 branch, there is a note indicating so. Otherwise, if you discover that the option you are interested in is not available in your 3.23 version, and you really need it, please upgrade to the most recent 3.23 branch.

Please be aware that 4.0 branch is still in alpha, so some things may not be working as smoothly as you would like. If you really would like to try the new features of 4.0, we recommend you do it in such a way that in case there is a problem, your mission-critical applications will not be disrupted.

On both master and slave you need to use the server-id option. This sets an unique replication id. You should pick a unique value in the range between 1 to 2^32-1 for each master and slave. Example: server-id=3.

The following table describes the options you can use for the MASTER:

Option	Description
log-bin=filename	Write to a binary update log to the specified location. Note that if you give it a parameter with an extension (for example, log-bin=/mysql/logs/replication.log ) versions up to 3.23.24 will not work right during replication if you do FLUSH LOGS. The problem is fixed in Version 3.23.25. If you are using this kind of log name, FLUSH LOGS will be ignored on binlog. To clear the log, run FLUSH MASTER, and do not forget to run FLUSH SLAVE on all slaves. In Versions 3.23.26 and later, you should use RESET MASTER and RESET SLAVE.
log-bin-index=filename	Because the user could issue the FLUSH LOGS command, we need to know which log is currently active and which ones have been rotated out and in what sequence. This information is stored in the binary log index file. The default is `hostname`.index. You should not need to change this. Example: log-bin-index=db.index
sql-bin-update-same	If set, setting SQL_LOG_BIN to a value will automatically set SQL_LOG_UPDATE to the same value, and vice versa.
binlog-do-db=database_name	Tells the master that it should log updates to the binary log if the current database is database_name. All other databases are ignored. Note that if you use this, you should ensure that you do updates only in the current database. Example: binlog-do-db=sales
binlog-ignore-db=database_name	Tells the master that updates where the current database is database_name should not be stored in the binary log. Note that if you use this, you should ensure that you do updates only in the current database. Example: binlog-ignore-db=accounting.

The following table describes the options you can use for the SLAVE:

Option	Description
master-host=host	Master hostname or IP address for replication. If not set, the slave thread will not be started. Note that the setting of master-host will be ignored if there exists a valid `master.info` file. Probably a better name for this option would have been something like bootstrap-master-host, but it is too late to change now. Example: master-host=db-master.mycompany.com
master-user=username	The username the slave thread will use for authentication when connecting to the master. The user must have the file privilege. If the master user is not set, user test is assumed. The value in `master.info` will take precedence if it can be read. Example: master-user=scott
master-password=password	The password the slave thread will authenticate with when connecting to the master. If not set, an empty password is assumed.The value in `master.info` will take precedence if it can be read. Example: master-password=tiger
master-port=portnumber	The port the master is listening on. If not set, the compiled setting of MYSQL_PORT is assumed. If you have not tinkered with configure options, this should be 3306. The value in `master.info` will take precedence if it can be read. Example: master-port=3306
master-connect-retry=seconds	The number of seconds the slave thread will sleep before retrying to connect to the master in case the master goes down or the connection is lost. Default is 60. Example: master-connect-retry=60
master-ssl	Available after 4.0.0. Turn SSL on for replication. Be warned that this is a relatively new feature. Example: master-ssl
master-ssl-key	Available after 4.0.0. Master SSL keyfile name. Only applies if you have enabled master-ssl. Example: master-ssl-key=SSL/master-key.pem
master-ssl-cert	Available after 4.0.0. Master SSL certificate file name. Only applies if you have enabled master-ssl. Example: master-ssl-key=SSL/master-cert.pem
master-info-file=filename	The location of the file that remembers where we left off on the master during the replication process. The default is `master.info` in the data directory. You should not need to change this. Example: master-info-file=master.info
report-host	Available after 4.0.0. Hostname or IP of the slave to be reported to the master during slave registration. Will appear in the output of SHOW SLAVE HOSTS. Leave unset if you do not want the slave to register itself with the master. Note that it is not sufficient for the master to simply read the IP of the slave off the socket once the slave connects. Due to NAT and other routing issues, that IP may not be valid for connecting to the slave from the master or other hosts. Example: report-host=slave1.mycompany.com.
report-port	Available after 4.0.0. Port for connecting to slave reported to the master during slave registration. Set it only if the slave is listening on a non-default port or if you have a special tunnel from the master or other clients to the slave. If not sure, leave this option unset.
replicate-do-table=db_name.table_name	Tells the slave thread to restrict replication to the specified table. To specify more than one table, use the directive multiple times, once for each table. This will work for cross-database updates, in contrast to replicate-do-db. Example: replicate-do-table=some_db.some_table
replicate-ignore-table=db_name.table_name	Tells the slave thread to not replicate to the specified table. To specify more than one table to ignore, use the directive multiple times, once for each table. This will work for cross-datbase updates, in contrast to replicate-ignore-db. Example: replicate-ignore-table=db_name.some_table
replicate-wild-do-table=db_name.table_name	Tells the slave thread to restrict replication to the tables that match the specified wildcard pattern. To specify more than one table, use the directive multiple times, once for each table. This will work for cross-database updates. Example: replicate-wild-do-table=foo%.bar% will replicate only updates to tables in all databases that start with foo and whose table names start with bar.
replicate-wild-ignore-table=db_name.table_name	Tells the slave thread to not replicate to the tables that match the given wildcard pattern. To specify more than one table to ignore, use the directive multiple times, once for each table. This will work for cross-database updates. Example: replicate-wild-ignore-table=foo%.bar% will not do updates to tables in databases that start with foo and whose table names start with bar.
replicate-ignore-db=database_name	Tells the slave thread to not replicate to the specified database. To specify more than one database to ignore, use the directive multiple times, once for each database. This option will not work if you use cross-database updates. If you need cross database updates to work, make sure you have 3.23.28 or later, and use replicate-wild-ignore-table=db_name.%. Example: replicate-ignore-db=some_db
replicate-do-db=database_name	Tells the slave thread to restrict replication to the specified database. To specify more than one database, use the directive multiple times, once for each database. Note that this will only work if you do not use cross-database queries such as UPDATE some_db.some_table SET foo='bar' while having selected a different or no database. If you need cross-database updates to work, make sure you have 3.23.28 or later, and use replicate-wild-do-table=db_name.%. Example: replicate-do-db=some_db
log-slave-updates	Tells the slave to log the updates from the slave thread to the binary log. Off by default. You will need to turn it on if you plan to daisy-chain the slaves.
replicate-rewrite-db=from_name->to_name	Updates to a database with a different name than the original. Example: replicate-rewrite-db=master_db_name->slave_db_name
slave-skip-errors= err_code1,err_code2,...	Available only in 3.23.47 and later. Tells the slave thread to continue replication when a query returns an error from the provided list. Normally, replication will discontinue when an error is encountered, giving the user a chance to resolve the inconsistency in the data manually. Do not use this option unless you fully understand why you are getting the errors. If there are no bugs in your replication setup and client programs, and no bugs in MySQL itself, you should never get an abort with error. Indiscriminate use of this option will result in slaves being hopelessly out of sync with the master and you having no idea how the problem happened. For error codes, you should use the numbers provided by the error message in your slave error log and in the output of SHOW SLAVE STATUS. A full list of error messages can be found in the source distribution in `Docs/mysqld_error.txt`. You can (but should not) also use a very non-recommended value of all which will ignore all error messages and keep barging along regardless. Needless to say, if you use it, we make no promises regarding your data integrity. Please do not complain if your data on the slave is not anywhere close to what it is on the master in this case—you have been warned. Example: slave-skip-errors=1062,1053 or slave-skip-errors=all.
skip-slave-start	Tells the slave server not to start the slave on the startup. The user can start it later with SLAVE START.
slave_read_timeout=#	Number of seconds to wait for more data from the master before aborting the read.

SQL Commands Related to Replication

Replication can be controlled through the SQL interface. Here is the summary of commands:

Command	Description
SLAVE START	Starts the slave thread. (Slave)
SLAVE STOP	Stops the slave thread. (Slave)
SET SQL_LOG_BIN=0	Disables update logging if the user has the process privilege. Ignored otherwise. (Master)
SET SQL_LOG_BIN=1	Re-enables update logging if the user has the process privilege. Ignored otherwise. (Master)
SET SQL_SLAVE_SKIP_COUNTER=n	Skip the next n events from the master. Only valid when the slave thread is not running; otherwise, gives an error. Useful for recovering from replication glitches.
RESET MASTER	Deletes all binary logs listed in the index file, resetting the binlog index file to be empty. In pre-3.23.26 versions, use FLUSH MASTER. (Master)
RESET SLAVE	Makes the slave forget its replication position in the master logs. In pre-3.23.26 versions, the command was called FLUSH SLAVE. (Slave)
LOAD TABLE tblname FROM MASTER	Downloads a copy of the table from master to the slave. Implemented mainly for debugging of LOAD DATA FROM MASTER, but some “gourmet” users might find it useful for other things. Do not use it if you consider yourself the average “non-hacker” type user. (Slave)
LOAD DATA FROM MASTER	Available starting in 4.0.0. Takes a snapshot of the master and copies it to the slave. Updates the values of MASTER_LOG_FILE and MASTER_LOG_POS so that the slave will start replicating from the correct position. Will honor table and database exclusion rules specified with replicate-* options. So far works only with MyISAM tables and acquires a global read lock on the master while taking the snapshot. In the future it is planned to make it work with InnoDB tables and to remove the need for global read lock using the non-blocking online backup feature.
CHANGE MASTER TO master_def_list	Changes the master parameters to the values specified in master_def_list and restarts the slave thread. master_def_list is a comma-separated list of master_def where master_def is one of the following: MASTER_HOST, MASTER_USER, MASTER_PASSWORD, MASTER_PORT, MASTER_CONNECT_RETRY, MASTER_LOG_FILE, or MASTER_LOG_POS. For example: CHANGE MASTER TO MASTER_HOST='master2.mycompany.com', MASTER_USER='replication', MASTER_PASSWORD='bigs3cret', MASTER_PORT=3306, MASTER_LOG_FILE='master2-bin.001', MASTER_LOG_POS=4;
CHANGE MASTER TO master_def_list (continued)	You only need to specify the values that need to be changed. The values that you omit will stay the same with the exception of when you change the host or the port. In that case, the slave will assume that since you are connecting to a different host or a different port, the master is different. Therefore, the old values of log and position are not applicable anymore, and will automatically be reset to an empty string and 0, respectively (the startvalues). Note that if you restart the slave, it will remember its last master. If this is not desirable, you should delete the `master.info` file before restarting, and the slave will read its master from `my.cnf` or the command-line.This command is useful for setting up a slave when you have the snapshot of the master and have recorded the log and the offset on the master that the snapshot corresponds to. You can run CHANGE MASTER TO MASTER_LOG_FILE='log_name_on_master', MASTER_LOG_POS=log_offset_on_master on the slave after restoring the snapshot.(Slave)
SHOW MASTER STATUS	Provides status information on the binlog of the master. (Master)
SHOW SLAVE HOSTS	Available after 4.0.0. Gives a listing of slaves currently registered with the master. (Master)
SHOW SLAVE STATUS	Provides status information on essential parameters of the slave thread. (Slave)
SHOW MASTER LOGS	Only available starting in Version 3.23.28. Lists the binary logs on the master. You should use this command prior to PURGE MASTER LOGS TO to find out how far you should go. (Master)
SHOW BINLOG EVENTS [ IN 'logname' ] [ FROM pos ] [ LIMIT [offset,] rows ]	Shows the events in the binary update log. Primarily used for testing/debugging, but can also be used by regular clients that for some reason need to read the binary log contents. (Master)
SHOW NEW MASTER FOR SLAVE WITH MASTER_LOG_FILE='logfile' AND MASTER_LOG_POS=pos AND MASTER_LOG_SEQ=log_seq AND MASTER_SERVER_ID=server_id	This command is used when a slave of a possibly dead/unavailable master needs to be switched to replicate off another slave that has been replicating the same master. The command will return recalculated replication coordinates, and the output can be used in a subsequent CHANGE MASTER TO command. Normal users should never need to run this command. It is primarily reserved for internal use by the fail-safe replication code. We may later change the syntax if we find a more intuitive way to describe this operation.
PURGE MASTER LOGS TO 'logname'	Available starting in Version 3.23.28. Deletes all the replication logs that are listed in the log index as being prior to the specified log, and removes them from the log index, so that the given log now becomes the first. Example: PURGE MASTER LOGS TO 'mysql-bin.010' This command will do nothing and fail with an error if you have an active slave that is currently reading one of the logs you are trying to delete. However, if you have a dormant slave, and happen to purge one of the logs it wants to read, the slave will be unable to replicate once it comes up. The command is safe to run while slaves are replicating—you do not need to stop them.You must first check all the slaves with SHOW SLAVE STATUS to see which log they are on, then do a listing of the logs on the master with SHOW MASTER LOGS, find the earliest log among all the slaves (if all the slaves are up to date, this will be the last log on the list), back up all the logs you are about to delete (optional), and purge up to the target log.

Replication FAQ

Q: How do I configure a slave if the master is already running and I do not want to stop it?

A: There are several options. If you have taken a backup of the master at some point and recorded the binlog name and offset (from the output of SHOW MASTER STATUS ) corresponding to the snapshot, do the following:

Make sure a unique server id is assigned to the slave.
Execute CHANGE MASTER TO MASTER_HOST='master-host-name', MASTER_USER='master-user-name', MASTER_PASSWORD='master-pass', MASTER_LOG_FILE='recorded-log-name', MASTER_LOG_POS=recorded_log_pos.
Execute SLAVE START.

If you do not have a backup of the master already, here is a quick way to do it consistently:

FLUSH TABLES WITH READ LOCK.
gtar zcf /tmp/backup.tar.gz /var/lib/mysql ( or a variation of this).
SHOW MASTER STATUS—make sure to record the output, as you will need it later.
UNLOCK TABLES.

Afterward, follow the instructions for the case when you have a snapshot and have recorded the log name and offset. You can use the same snapshot to set up several slaves. As long as the binary logs of the master are left intact, you can wait as long as several days or in some cases maybe a month to set up a slave once you have the snapshot of the master. In theory the waiting gap can be infinite. The two practical limitations are the disk space of the master getting filled with old logs, and the amount of time it will take the slave to catch up.

In Versions 4.0.0 and newer, you can also use LOAD DATA FROM MASTER. This is a convenient command that will take a snapshot, restore it to the slave, and adjust the log name and offset on the slave all at once. In the future, LOAD DATA FROM MASTER will be the recommended way to set up a slave. Be warned, however, that the read lock may be held for a long time if you use this command. It is not yet implemented as efficiently as we would like to have it. If you have large tables, the preferred method at this time is still with a local tar snapshot after executing FLUSH TABLES WITH READ LOCK.

Q: Does the slave need to be connected to the master all the time?

A: No, it does not. You can have the slave go down or stay disconnected for hours or even days, then reconnect, catch up on the updates, and then disconnect or go down for a while again. So you can, for example, use the master-slave setup over a dial-up link that is up only for short periods of time. The implications of that are that at any given time the slave is not guaranteed to be in sync with the master unless you take some special measures. In the future, we will have the option to block the master until at least one slave is in sync.

Q: How do I force the master to block updates until the slave catches up?

A: Execute the following commands:

Master: FLUSH TABLES WITH READ LOCK.
Master: SHOW MASTER STATUS—record the log name and the offset.
Slave: SELECT MASTER_POS_WAIT('recorded_log_name', recorded_log_offset). When the select returns, the slave is currently in sync with the master.
Master: UNLOCK TABLES—now the master will continue updates.

Q: Why do I sometimes see more than one Binlog_Dump thread on the master after I have restarted the slave?

A: Binlog_Dump is a continuous process that is handled by the server in the following way:

Catch up on the updates.
Once there are no more updates left, go into pthread_cond_wait( ), from which we can be awakened either by an update or a kill.
On wake-up, check the reason. If we are not supposed to die, continue the Binlog_dump loop.
If there is some fatal error, such as detecting a dead client, terminate the loop.

So if the slave thread stops on the slave, the corresponding Binlog_Dump thread on the master will not notice it until after at least one update to the master (or a kill), which is needed to wake it up from pthread_cond_wait( ). In the meantime, the slave could have opened another connection, which resulted in another Binlog_Dump thread.

The preceding problem should not be present in Versions 3.23.26 and later. In Version 3.23.26 we added server-id to each replication server, and now all the old zombie threads are killed on the master when a new replication thread connects from the same slave.

Q: How do I rotate replication logs?

A: In Version 3.23.28 you should use the PURGE MASTER LOGS TO command after determining which logs can be deleted, and optionally backing them up first. In earlier versions the process is much more painful, and cannot be safely done without stopping all the slaves in the case that you plan to re-use log names. You will need to stop the slave threads, edit the binary log index file, delete all the old logs, restart the master, start slave threads, and then remove the old log files.

Q: How do I upgrade on a hot replication setup?

A: If you are upgrading pre-3.23.26 versions, you should just lock the master tables, let the slave catch up, run FLUSH MASTER on the master and FLUSH SLAVE on the slave to reset the logs, then restart new versions of the master and the slave. Note that the slave can stay down for some time. Since the master is logging all the updates, the slave will be able to catch up once it is up and can connect.

After 3.23.26, we have locked the replication protocol for modifications, so you can upgrade masters and slaves on the fly to a newer 3.23 versions and you can have different versions of MySQL running on the slave and the master, as long as they are both newer than 3.23.26.

Q: What issues should I be aware of when setting up two-way replication?

A: MySQL replication currently does not support any locking protocol between master and slave to guarantee the atomicity of a distributed (cross-server) update. In other words, it is possible for client A to make an update to co-master 1, and in the meantime, before it propagates to co-master 2, client B could make an update to co-master 2 that will make the update of client A work differently than it did on co-master 1. Thus, when the update of client A will make it to co-master 2, it will produce tables that will be different from what you have on co-master 1, even after all the updates from co-master 2 have also propagated. So you should not co-chain two servers in a two-way replication relationship, unless you are sure that your updates can safely happen in any order, or unless you take care of mis-ordered updates somehow in the client code.

You must also realise that two-way replication actually does not improve performance very much, if at all, as far as updates are concerned. Both servers need to do the same amount of updates each, as you would have one server do. The only difference is that there will be a little less lock contention because the updates originating on another server will be serialised in one slave thread. This benefit, though, might be offset by network delays.

Q: How can I use replication to improve performance of my system?

A: You should set up one server as the master, direct all writes to it, and configure as many slaves as you have the money and rackspace for, distributing the reads among the master and the slaves. You can also start the slaves with --skip-bdb, --low-priority-updates and --delay-key-write-for-all-tables to get speed improvements for the slave. In this case the slave will use non-transactional MyISAM tables instead of BDB tables to get more speed.

Q: What should I do to prepare my client code to use performance-enhancing replication?

A: If the part of your code that is responsible for database access has been properly abstracted/modularised, converting it to run with the replicated setup should be very smooth and easy. Just change the implementation of your database access to read from some slave or the master, and to always write to the master. If your code does not have this level of abstraction, setting up a replicated system will give you an opportunity/motivation to it clean up. You should start by creating a wrapper library /module with the following functions:

safe_writer_connect( )
safe_reader_connect( )
safe_reader_query( )
safe_writer_query( )

safe_ means that the function will take care of handling all the error conditions.

You should then convert your client code to use the wrapper library. It may be a painful and scary process at first, but it will pay off in the long run. All applications that follow the preceding pattern will be able to take advantage of one-master/many-slaves solution. The code will be a lot easier to maintain, and adding troubleshooting options will be trivial. You will just need to modify one or two functions, for example, to log how long each query took, or which query, among your many thousands, gave you an error. If you have written a lot of code already, you may want to automate the conversion task by using Monty’s replace utility, which comes with the standard distribution of MySQL, or just write your own Perl script. Hopefully, your code follows some recognisable pattern. If not, you are probably better off rewriting it anyway, or at least going through and manually beating it into a pattern.

Note that, of course, you can use different names for the functions. What is important is having a unified interface for connecting for reads, connecting for writes, doing a read, and doing a write.

Q: When and how much can MySQL replication improve the performance of my system?

A: MySQL replication is most beneficial for a system with frequent reads and not-so-frequent writes. In theory, by using a one-master/many-slaves setup you can scale by adding more slaves until you either run out of network bandwidth, or your update load grows to the point that the master cannot handle it.

In order to determine how many slaves you can get before the added benefits begin to level out, and how much you can improve performance of your site, you need to know your query patterns, and empirically (by benchmarking) determine the relationship between the throughput on reads (reads per second, or max_reads) and on writes max_writes) on a typical master and a typical slave. The following example will show you a rather simplified calculation of what you can get with replication for our imagined system.

Let’s say our system load consists of 10% writes and 90% reads, and we have determined that max_reads = 1200 - 2 * max_writes, or in other words, our system can do 1200 reads per second with no writes, our average write is twice as slow as our average read, and the relationship is linear. Let us suppose that our master and slave are of the same capacity, and we have N slaves and 1 master. Then we have for each server (master or slave):

reads = 1200 - 2 * writes (from benchmarks)
reads = 9* writes / (N + 1) (reads split, but writes go to all servers)
9*writes/(N+1) + 2 * writes = 1200
writes = 1200/(2 + 9/(N+1)

So if N = 0, which means we have no replication, our system can handle 1200/11, or about 109 writes per second (which means we will have 9 times as many reads due to the nature of our application).

If N = 1, we can get up to 184 writes per second.
If N = 8, we get up to 400 writes.
If N = 17, we get 480 writes.

Eventually as N approaches infinity (and our budget negative infinity), we can get very close to 600 writes per second, increasing system throughput by about 5.5 times. However, with only 8 servers, we increased it almost 4 times already.

Note that our computations assumed infinite network bandwidth, and neglected several other factors that could turn out to be significant on your system. In many cases, you may not be able to make a computation similar to the preceding one that will accurately predict what will happen on your system if you add N replication slaves. However, answering the following questions should help you decide whether and how much, if at all, the replication will improve the performance of your system:

What is the read/write ratio on your system?
How much more write load can one server handle if you reduce the reads?
How many slaves do you have bandwidth for on your network?

Q: How can I use replication to provide redundancy/high availability?

A: With the currently available features, you would have to set up a master and a slave (or several slaves), write a script that will monitor the master to see if it is up, and instruct your applications and the slaves of the master change in case of failure. Some suggestions:

To tell a slave to change the master, use the CHANGE MASTER TO command.
A good way to keep your applications informed as to the location of the master is to have a dynamic DNS entry for the master. With bind you can use nsupdate to dynamically update your DNS.
You should run your slaves with the log-bin option and without log-slave-updates. This way the slave will be ready to become a master as soon as you issue STOP SLAVE, RESET MASTER, and CHANGE MASTER TO on the other slaves. It will also help you catch spurious updates that may happen because of misconfiguration of the slave (ideally, you want to configure access rights so that no client can update the slave, except for the slave thread) combined with the bugs in your client programs (they should never update the slave directly).

We are currently working on integrating an automatic master election system into MySQL, but until it is ready, you will have to create your own monitoring tools.

Troubleshooting Replication

If you have followed the instructions and your replication setup is not working, first eliminate the user-error factor by checking the following:

Is the master logging to the binary log? Check with SHOW MASTER STATUS. If it is, Position will be non-zero. If not, verify that you have given the master log-bin option and have set server-id.
Is the slave running? Check with SHOW SLAVE STATUS. The answer is found in Slave_running column. If not, verify slave options and check the error log for messages.
If the slave is running, did it establish connection with the master? Do SHOW PROCESSLIST, find the thread with the system user value in the User column and none in the Host column, and check the State column. If it says connecting to master, verify the privileges for the replication user on the master, master hostname, your DNS setup, whether the master is actually running, whether it is reachable from the slave, and if all that seems okay, read the error logs.
If the slave was running but then stopped, look at SHOW SLAVE STATUS output and check the error logs. It usually happens when some query that succeeded on the master fails on the slave. This should never happen if you have taken a proper snapshot of the master, and have never modified the data on the slave outside of the slave thread. If it does, it is a bug. Report it as described later.
If a query that succeeded on the master refuses to run on the slave, and a full database resync (the proper thing to do) does not seem feasible, try the following:
- First see if there is some stray record in the way. Understand how it got there, then delete it and run SLAVE START.
- If that does not work or does not apply, try to understand if it would be safe to make the update manually (if needed) and then ignore the next query from the master.
- If you have decided you can skip the next query, do SET SQL_SLAVE_SKIP_COUNTER=1; SLAVE START; to skip a query that does not use AUTO_INCREMENT or use LAST_INSERT_ID( ), or SET SQL_SLAVE_SKIP_COUNTER=2; SLAVE START; otherwise. The reason queries that use AUTO_INCREMENT or LAST_INSERT_ID( ) are different is that they take two events in the binary log of the master.
- If you are sure the slave started out perfectly in sync with the master, and no one has updated the tables involved outside of the slave thread, report the bug, so you will not have to do perform these tricks again.
Make sure you are not running into an old bug by upgrading to the most recent version.
If all else fails, read the error logs. If they are big, perform grep -i slave /path/to/your-log.err on the slave. There is no generic pattern to search for on the master, as the only errors it logs are general system errors. If it can, it will send the error to the slave when things go wrong.

When you have determined that there is no user error involved, and replication still either does not work at all or is unstable, it is time to start working on a bug report. We need to get as much info as possible from you to be able to track down the bug. Please spend some time and effort preparing a good bug report. Ideally, we would like to have a test case in the format found in the mysql-test/t/rpl* directory of the source tree. If you submit a test case like that, you can expect a patch within a day or two in most cases, although, of course, your mileage may vary depending on a number of factors.

The second best option is to write a simple program with easily configurable connection arguments for the master and the slave that will demonstrate the problem on our systems. You can write one in Perl or in C, depending on which language you know better.

If you use one of these methods to demonstrate the bug, use mysqlbug to prepare a bug report and send it to bugs@lists.mysql.com. If you have a phantom, a problem that does occur but that you cannot duplicate “at will”:

Verify that there is no user error involved. For example, if you update the slave outside of the slave thread, the data will be out of sync, and you can have unique key violations on updates, in which case the slave thread will stop and wait for you to clean up the tables manually to bring them in sync.
Run slave with log-slave-updates and log-bin. This will keep a log of all updates on the slave.
Save all evidence before resetting the replication. If we have no or only sketchy information, it will take us a while to track down the problem. Collect the following evidence:
- All binary logs on the master
- All binary logs on the slave
- The output of SHOW MASTER STATUS on the master at the time you have discovered the problem
- The output of SHOW SLAVE STATUS on the master at the time you have discovered the problem
- Error logs on the master and on the slave
Use mysqlbinlog to examine the binary logs. The following should be helpful to find the trouble query, for example:
```
mysqlbinlog -j pos_from_slave_status /path/to/log_from_slave_status | head
```

Once you have collected the evidence on the phantom problem, try hard to isolate it into a separate test case first. Then report the problem to bugs@lists.mysql.com with as much info as possible.

Get MySQL Reference Manual now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

MySQL Reference Manual by Michael Widenius, David Axmark, Kaj Arno

Replication in MySQL

Introduction

Replication Implementation Overview

How to Set Up Replication

Replication Features and Known Problems

Replication Options in `my.cnf`

SQL Commands Related to Replication

Replication FAQ

Troubleshooting Replication

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Replication in MySQL

Introduction

Replication Implementation Overview

How to Set Up Replication

Replication Features and Known Problems

Replication Options in my.cnf

SQL Commands Related to Replication

Replication FAQ

Troubleshooting Replication

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Replication Options in `my.cnf`