Disk

This is the one resource that probably has the most room for variance. Several factors can help to determine the optimal disk size:

  • Anticipated size of a single copy of the dataset
  • Replication Factor (RF)
  • Operational throughput requirements
  • Cost of cloud volumes (usually per hour)
  • Compaction strategy used on the larger tables
  • Whether the size of the dataset will be static, or grow over time
  • Whether the application team has an archival strategy
I have built production Cassandra instances on as much as 1 TB, and as little as 40 GB. Typically, nodes with larger amounts of data also need to have more compute resource available to them.

Let's walk through a little exercise here.

Assume that we need to build a cluster for an application ...

Get Mastering Apache Cassandra 3.x - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.