Hadoop in a Cassandra cluster

The production version of the Hadoop and Cassandra combination needs to go into a separate cluster. The first obvious issue is you probably wouldn't want Hadoop to keep polling Cassandra nodes, hampering Cassandra's performance to end users. The general pattern to avoid this is to split the ring into two data centers. Since Cassandra automatically and immediately replicates the changes between data centers, they will always be in sync. What's more, you can assign one of the data centers as transactional with a higher replication factor and the other as an analytical data center with a replication factor 1. The analytical data center is the one used by Hadoop without affecting the transactional data center.

Now, you ...

Get Mastering Apache Cassandra - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.