Failure management

In this section, we will deal with detecting failures, and actions to be done to rectify failures. There can be drive, server, zone, or even region failures. As described in Chapter 2, OpenStack Swift Architecture, during the CAP theorem discussion, Swift is designed for availability and tolerance to partial failure (where entire parts of a cluster can fail).

Detecting drive failures

Kernel logs are a good place to look for drive failures. The disk subsystem will log warnings or errors that can help an administrator determine whether drives are going bad or have already failed. We can also set up a script on storage nodes (explained in the following steps) to capture drive failure information using the drive audit process described ...

Get OpenStack Object Storage (Swift) Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.