Data replication in Cassandra

So far, we've developed a model of distribution in which the total data set is distributed among multiple machines, but any given piece of data lives on only one machine. This model carries a big advantage over a single-node configuration, which is that it's horizontally scalable. By distributing data over multiple machines, we can accommodate ever-larger data sets simply by adding more machines to our cluster.

But our current model doesn't solve the problem of fault-tolerance. No hardware is perfect; any production deployment must acknowledge that a machine might fail. Our current model isn't resilient to such failures: for instance, if Node 1 in our original three-node cluster were to suddenly catch fire, we would ...

Get Learning Apache Cassandra now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.