Anti-entropy repair is triggered manually. Anti-entropy repair is very useful and is often recommended to be run periodically to keep data in sync. Often, hints and read-repair mechanisms are not sufficient to keep data in sync.
Cassandra accomplishes anti-entropy repair using Merkle trees, similar to Dynamo and Riak. Anti-entropy is a process of comparing the data of all replicas and updating each replica to the newest version. Cassandra has three phases to the process:
- Building a Merkle tree for each replica
- Comparing the Merkle trees to discover differences
- Streaming the relevant data
Why is running anti-entropy repair frequently so important? Consider a cluster with a replication factor of 3. Suppose a partition ...