Disaster recovery

If the swarm directory content is lost or corrupted on a manager, it's required to immediately remove that manager out of the cluster using the docker node remove nodeID command (and use --force in case it gets stuck temporarily).

The cluster administrator should not start a manager or join it to the cluster with an out-of-date swarm directory. Joining the cluster with the out-of-date swarm directory brings the cluster to an inconsistent state, as all managers will try to synchronize wrong data during the process.

After bringing down the manager with the corrupted directory, it's necessary to delete the /var/lib/docker/swarm/raft/wal and /var/lib/docker/swarm/raft/snap directories. Only after this step can the manager safely re-join ...

Get Native Docker Clustering with Swarm now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.