In this chapter, we look at how to configure Cassandra. We walk through creating keyspaces, setting up replication, and using a proper replica placement strategy.
Cassandra works out of the box with no configuration at all; you can simply download and decompress, and then execute the program to start the server with its default configuration.
We will focus on aspects of Cassandra that affect node behavior in a cluster, performance, and meta-operations such as replication, partitioning, and snitches. Performance tuning is considered a separate topic and gets its own treatment in Chapter 11.
Cassandra development is moving quickly, and there have been many inveterate changes. I’ve tried my best to keep up with them here.
Keyspaces used to be defined statically in an XML configuration file, but as of 0.7, you can use the API to create keyspaces and column families.
In Cassandra version 0.6 and prior, configuration for your cluster and column families was stored in a file called storage-conf.xml. Then there was a transitional conversion from XML to YAML, so you will see references to storage-conf.xml and cassandra.yaml. But version 0.7 introduced dynamic loading, so all creation and modification of keyspace and column family definition is done through the Thrift API or the command-line interface (CLI).
Starting with version 0.7 of Cassandra, you can use API operations to make changes to your schemas, much like you would in SQL by issuing Data ...