In this chapter, we look at how to tune Cassandra to improve performance. A variety of settings in the configuration file help us do this, and we present a few pointers on hardware selection and configuration. There are several isolated settings that you can update in Cassandra’s configuration file; although the defaults are often appropriate, there might be circumstances in which you need to change them. In this chapter, we look at several of those settings.
As a general rule, it’s important to note that simply adding nodes to a cluster will not improve performance on its own. You need to replicate the data appropriately, then send traffic to all the nodes from your clients. If you aren’t distributing client requests, the new nodes could just stand by somewhat idle.
We also see how to use the Python stress test tool that ships with Cassandra to run a reasonable load against Cassandra and quickly see how it behaves under stress test circumstances. We can then tune Cassandra appropriately and feel confident that we’re ready to launch in a staging environment.
There are two sets of files that Cassandra writes to as part of handling update operations: the commit log and the datafile. Their different purposes need to be considered in order to understand how to treat them during configuration.
The commit log can be thought of as short-term storage. As Cassandra receives updates, every write value is written immediately to the commit log in the form ...