You are previewing Redis Cookbook.

Redis Cookbook

Cover of Redis Cookbook by Tiago Macedo... Published by O'Reilly Media, Inc.
O'Reilly logo

Chapter 4. Redis Administration and Maintenance

In this chapter, we’ll try to focus on recipes related to operating Redis servers, instead of programming applications or data modeling. These tasks vary widely, but include starting a Redis slave, upgrading an existing server, performing backups, sharding, and handling a dataset larger than your available memory.

Configuring Persistence

Problem

One of the advantages of Redis over other key/value stores like memcached is its support for persistence—in fact, it even comes preconfigured with this support. This functionality enables you to perform some operations that wouldn’t be possible otherwise, like upgrading your server without down time or performing backups.

Nevertheless, persistence should be configured in a way that suits your dataset and usage patterns.

Solution

The default persistence model is snapshotting, which consists of saving the entire database to disk in the RDB format (basically a compressed database dump). This can be done periodically at set times, or every time a configurable number of keys changes.

The alternative is using an Append Only File (AOF). This might be a better option if you have a large dataset or your data doesn’t change very frequently.

Discussion

Snapshotting

As previously stated, snapshotting is the default persistence mode for Redis. It asynchronously performs a full dump of your database to disk, overwriting the previous dump only if successful. Therefore, the latest dump should always be in your dbfilename location.

You can configure snapshotting using save seconds keys-changed statements in your configuration file, in the following format:

save seconds keys-changed

The snapshot will occur when both conditions match. A typical example that ensures that all your data is saved every few minutes is: save 600 1 which will perform a snapshot every 10 minutes if any key in your server has changed.

You can manually trigger snapshotting with the SAVE and BGSAVE commands. BGSAVE forks the main Redis process and saves the DB to disk in the background. Redis executes this operation itself if you have SAVE statements in your configuration file. SAVE performs the same operation as BGSAVE but does so in the foreground, thereby blocking your Redis server.

If you come to the conclusion that snapshotting is putting too much strain on your Redis servers you might want to consider using slaves for persistence (by commenting out all the save statements in your masters and enabling them only on the slaves), or using AOF instead. In particular, if you have a big dataset or a dataset that doesn’t change often, consider using AOF.

AOF

The Append Only File persistence mode keeps a log of the commands that change your dataset in a separate file. Like most writes on modern operating systems, any data logged to AOF is left in memory buffers and written to disk at intervals of a few seconds using the system’s fsync call. You can configure how often the AOF gets synched to disk by putting appendfsync statements in your configuration file. Valid options are always, everysec, and no.

Warning

Disabling fsync is not safe, as it leaves the decision to your operating system about when to actually write the data to disk.

AOF can be used together with snapshotting. But you might decide to suppress snapshots because they put too much load on your server. If you’re not snapshotting, be sure to write the AOF to a RAID array or have at least one Redis slave that you can recover data from in case of disaster.

Note

BGREWRITEAOF rewrites the AOF to match the current database. Depending on how often you update existing data, this will greatly reduce the size of the AOF. If your data changes very often, the on-disk file will grow very fast, so you should compact it by issuing BGREWRITEAOF regularly. The rewrite is done in the background.

Starting a Redis Slave

Problem

Database slaves are useful for a number of reasons. You might need them to load-balance your queries, keep hot standby servers, perform maintenance operations, or simply inspect your data.

Solution

Redis supports master-slave replication natively: you can have multiple slaves per master and slaves connecting to slaves. You can configure replication on the configuration file before starting a server or by connecting to a running server and using the SLAVEOF command.

Discussion

In order to configure a Redis slave using the configuration file, you should add add the following to your redis.conf:

slaveof master-ip-or-hostname masterport

Start or restart the server afterwards. Should your Redis master have password authentication enabled, you’ll need to specify it as well:

masterauth master-password

If you want to turn a running Redis server into a slave (or switch to a different master), you can do it using the SLAVEOF command:

SLAVEOF master-ip-or-hostname [masterport]

As in the previous example, if you’re using authentication, you’ll need to specify it beforehand:

CONFIG SET masterauth password

Keep in mind that should your server restart, this configuration will be lost. Therefore, you should also commit your changes to the configuration file.

Note

CONFIG SET allows you to read configuration parameters from a running Redis server. CONFIG GET enables you to set configuration parameters on a running Redis server. Please refer to the documentation for these commands’ parameters.

Handling a Dataset Larger Than Memory

Problem

Often you might find yourself with a dataset that won’t fit in your available memory. While you could try to get around that by adding more RAM or sharding (which in addition would allow you you to scale horizontally), it might not be feasible or practical to do so.

Solution

Redis has supported a feature called Virtual Memory (VM) since version 2.0. This allows you to have a dataset bigger than your available RAM by swapping rarely used values to disk and keeping all the keys and the frequently used values in memory. However, this has one downside: before Redis reads or performs an operation on swapped values, they must be read into real memory.

Discussion

If you decide to use VM, you should be aware of its ideal use cases and the tradeoffs you’re making.

  • The keys are always kept in memory. This means that if you have a big number of small keys, VM might not be the best option for you or you might have the change your data structure and use large strings, hashes, lists, or sets instead.

  • VM is ideal for some patterns of data access, not all. If you regularly query all your data, VM is probably not a good fit because your Redis server might end up blocking clients in order to fetch the values from disk. VM is ideally suited for situations when you have a reasonable amount of frequently accessed values that fit in memory.

  • Doing a full dump of your Redis server will be extremely slow. In order to generate a snapshot, Redis needs to read all the values swapped to disk in order to write them to the RDB file (see Configuring Persistence). This generates a lot of I/O. Due to this, it might be better for you to use AOF as a persistence mode.

  • VM also affects the speed of replication, because Redis masters need to perform a BGSAVE when a new slave connects.

Still, there are scenarios where using VM makes sense. In order to enable it, you’ll need to add this to your configuration file:

vm-enabled yes

There are other settings that you should pay attention to when enabling VM:

  • vm-swap-file specifies the location of the swap file in your filesystem.

  • vm-max-memory allows you to specify the maximum amount of memory Redis should use before beginning to swap values. Beware that this is a soft limit, because keys are always kept in memory and because Redis won’t swap values to disk while creating a new snapshot.

  • vm-pages specifies the number of pages in your swap file.

  • vm-page-size defines the size of a page in bytes. The page size and the number of pages are very important, because Redis won’t allocate more than one value to the same page, so together these determine the amount of data your swap file can handle.

  • vm-max-threads is the maximum number of threads available to perform I/O operations. Setting it to 0 enables blocking VM, which means that your Redis server will block all clients when it needs to read a value from disk. Once again, depending on your data access patterns, this may or may not be the best option.

As with any other disk-based database, Redis VM will perform better the faster your I/O is. So the use of SSDs such as Flash is encouraged. You can read more about VM use cases, configuration details, and tradeoffs in the official Redis documentation.

Upgrading Redis

Problem

At some point in the life of your system you might need to upgrade Redis. Unfortunately, Redis can’t do online binary upgrades, and doing a server restart means that your application won’t be able to talk to Redis for a (possibly long) period of time. But that doesn’t mean that there aren’t other ways to achieve it without incurring downtime. You might also want to move your current Redis database to another system for maintenance purposes, a hardware upgrade, etc.

Solution

Our solution will involve starting a new Redis server in slave mode, switching over the clients to the slave and promoting the new server to the master role. To make the example easier to understand, let’s assume we have a Redis server listening on port 6379.

Note

It might be easier to start the slave on a new server than on the existing one. This is because of memory requirements, and because you can reuse the same configuration file, directories, and port for the slave, changing only the hostname or IP address.

  1. Install the new Redis version without restarting your existing server.

  2. Create a new redis.conf, specifying that Redis runs on port 6380 (assuming you’re on the same system—if you’re not, you can still use 6379 or any other available port) and a different DB directory (you don’t want to have 2 Redis servers reading or writing the same files).

  3. Start the new server.

  4. Connect to the new server and issue the command:

    SLAVEOF localhost 6379

    This will trigger a BGSAVE on the master server, and upon completion the new (slave) server will start replicating. You can check the current status using the INFO command on the slave. When you see master_link_status:up, the replication is active.

  5. Since your new Redis server is now up-to-date, you can start moving over your clients to this new server. You can verify the number of clients connected to a server with the INFO command; check the connected_clients variable.

  6. When all your clients are connected to the slave server, you still have two tasks to complete: disable the replication and shut down the master server.

Note

INFO returns information about the server including replication status, uptime, memory usage, number of keys per database and other statistics.

  1. Connect to the slave server and issue:

    SLAVEOF NO ONE

    This will stop replication and effectively promote your slave into a master. This is important in Redis 2.2. as master servers are responsible for sending expirations to their slaves.

  2. Now connect to your old master server and issue:

    SHUTDOWN

    The old master server will perform a SAVE and shutdown.

  3. Your new Redis system is up and running, but make sure that all your configuration files, init scripts, backups, etc. are pointing to the right location and starting the correct server. It’s easy to forget those routine operations, but you should at the very least certify that nothing wrong will happen in case of a server restart.

Discussion

Doing an online upgrade has a couple of (possibly steep) requirements: you need to able to point your Redis clients to another server, either by use of a proxy, by having failover built-in to your clients (so that they connect to a different server once you bring the master down), or just by simply tell them to connect to another server. You’ll also need to have at least twice as much memory available (possibly in a different system).

Beware that doing this might be dangerous, depending on how different the Redis versions you are upgrading from and to. At the very least, it should be safe for updates of minor versions of Redis. For major upgrades, each has caveats. For example, upgrading from 2.0 to 2.2 should be fine so long as you don’t use EXPIRE, since the way expirations are handled during expiration changed between these versions. Like every other maintenance operation, make sure to test before doing it on your production servers.

Backing up Redis

Problem

One issue comes up frequently when talking about NoSQL databases is backing up your data. The notion that these are hard to back up, however, is mostly a misperception since most of the techniques that you’d use to backup a relational database can also be used for NoSQL databases.

If, for some distributed databases, grabbing a point-in-time snapshot of your data might be tricky, this is certainly not the case with Redis. In this section, we’ll explain how to achieve it depending on which Redis persistence model you’re using. We’ll assume you’re running your servers on Linux, although filesystem-specific functionality might also be available for other platforms.

Solution

Our proposed solution is heavily dependent on your Redis persistence model:

  • With the default persistence model (snapshotting), you’re best off using a snapshot as a backup.

  • If you’re using only AOF, you’ll have to back up your log in order to be able to replay it on startup.

  • If you’re running your Redis in VM mode, you might want to use an AOF log for the purpose of backups, as the use of snapshotting is not advised with VM.

It’s up to you to store your backup properly. Ideally, you’ll store at least a couple of copies of it, have at least one offsite, and do it in a fully automated way. We’ll try to explain how to do backups for the different persistance models, but be sure to test your own procedures. Be sure to also test your recovery procedures regularly.

Keep in mind that backing up your data might increase the strain on your production systems. It’s probably a good idea to perform the backups on a slave Redis instance, and to actually have slaves running at all times because promoting a new server to master is probably quicker than restoring a backup.

Discussion

Snapshotting

Snapshotting is the default Redis persistance model. As mentioned earlier, depending on your settings, Redis will persist its data to disk if m keys changed in n seconds. When using this persistence mode, performing a backup is really simple. All you have to do is copy the current snapshot to another location.

Warning

Use a copy, not a move, because if Redis crashes and restarts and the snapshot is not there, you will end up losing all your data!

In case you want an up-to-date snapshot (instead of using the last one Redis did according to your settings) you can trigger it by issuing:

redis-cli BGSAVE

and then waiting for the dump file to be updated. Be sure to compress the snapshot before backing it up. That will probably reduce its size by at least a factor of 10.

Restoring a snapshot file is also quite simple. Simply shut down the server, put the snapshot you want to restore in the dbfilename location configured by redis.conf, and then start the server. This order is important, because when Redis shuts down, it performs a snapshot, thus overwriting this file.

Append-Only Log (AOF)

If you’re using the AOF as the only persistence mode (you can also use it together with snapshotting) the easiest way to do a backup is still to perform use a snapshot as described in the previous section. However, if you’re using AOF, you’re most likely worried about losing data between snapshots. You may also be avoiding snapshots because they put too much load on your server.

In order to recover when using the AOF, just do the same procedure you would for snapshotting, but instead put your backup in the AOF location. On startup, Redis will simply replay the log.

Be sure to remember to run BGREWRITEAOF regularly if you’re using AOF.

Should your Redis server refuse to start due to a corrupted AOF—which can happen if the server crashes or is killed while writing to the file—you can use the redis-check-aof utility to fix your AOF:

redis-check-aof --fix filename

VM

If you are running Redis in VM mode, be sure to understand the tradeoffs. Starting or stopping your server will take a long time if you have a big dataset. Performing a snapshot in order to back up your data might also take a long time. Nevertheless, if your Redis instances are running with VM enabled, you should still perform backups. But you’re probably best doing it in a slave that is not too busy serving requests.

If you have a big database, using BGSAVE to perform a snapshot is probably not feasable. You’re most likely better off using AOF and rewriting it at regular intervals (depending on how often your data changes, but you probably don’t want to do this too often). Beware that while you’re performing a BGSAVE or BGREWRITEAOF, Redis will not write to the VM. Therefore your memory usage might increase while it’s processing these background operations.

Since you’re backing up your AOF, the restoring procedure is exactly the same as in the previous section: just copy over your AOF and start Redis.

Sharding Redis

Problem

Sharding is a horizontal partitioning tecnique often used with databases. It allows you to scale them by distributing your data across several database instances. Not only does this allow you to have a bigger dataset, as you can use more memory, it will also help if CPU usage is the problem, since you can distribute your instances through different servers (or servers with multiple CPUs).

In Redis’s case, sharding can be easily implemented in the client library or application.

Solution

Since Redis Cluster is still under development and should only be released sometime later in 2011—with a beta most likely arriving in the summer—sharding is a useful tecnique for scaling your application when your data no longer fits in a single server.

Currently there are three possibilities when it comes to sharding Redis databases:

Use a client with built-in sharding support

At this point, most Redis clients don’t support sharding. Notable exceptions are:

Predis, a PHP client
Redisent, a PHP client
Rediska, a PHP client
Jedis, a Java client
scala-redis - a Scala client.
Build sharding support yourself on top of an existing client

This involves some programming that might not be too hard if you understand your dataset and applications thoroughly. At the very least, you’ll have to implement a partitioning rule and handle the connections to the different servers.

Use a proxy that speaks the Redis protocol and does the sharding for you

Redis Sharding is a multiplexed proxy that provides sharding to any Redis client. Instead of connecting directly to your Redis servers, you start a proxy and connect to it instead. Unfortunately at this moment, sharding doesn’t support resharding on the fly, so you’ll be unable to change the configuration of the cluster with the proxy running.

Discussion

If you decide to implement sharding yourself, you should probably use consistent hashing. This will ensure a minimal amount of remapping if you add or remove shards.

Sharding doesn’t remove the need for replication. Make sure your cluster is redundant so that the loss of a server doesn’t imply any loss of data. Jeremy Zawodny described on his blog the setup used at Craiglist, and Salvatore has written on the subject as well.

Something else to keep in mind is that (depending on your implementation) you will not be able to perform some operations that affect multiple keys, because those keys might be in different shards (servers). If you rely on these operations, you’ll need to adjust your hashing algorithm to ensure that all the required keys will always be in the same shard.

The best content for your career. Discover unlimited learning on demand for around $1/day.