Data replication in Cassandra

So far, we've developed a model of distribution in which the total data set is distributed among multiple machines, but any given piece of data lives on only one machine. This model carries a big advantage over a single-node configuration, which is that it's horizontally scalable. By distributing data over multiple machines, we can accommodate ever-larger data sets simply by adding more machines to our cluster.

But our current model doesn't solve the problem of fault tolerance. No hardware is perfect; any production deployment must acknowledge the fact that a machine might fail. Our current model isn't resilient to such failures: for instance, if Node 1 in our original three-node cluster were to suddenly catch ...

Get Learning Apache Cassandra - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.