Posted on by & filed under Content - Highlights and Reviews, Information Technology, Mobile Development, Programming & Development, Web Development.

Apache Cassandra is an open source database system that belongs to the category of databases referred to as NoSQL or Not only SQL (Structure Query Language). Cassandra is designed to handle massive scalability challenges commonly encountered by modern web applications, thanks to its unique data model and architecture. In this post, we will cover the basics of Cassandra that will get you started on your journey to explore this powerful database system.

How is Cassandra different from SQL databases?

In traditional relational databases (MySQL, Microsoft SQL Server, and Oracle), data is stored in tables. Each table consists of rows that are made up of a fixed number of predefined columns. Data is stored and retrieved using the powerful Structure Query Language (SQL), which allows data to be selected, joined, and manipulated in other ways to perform complex querying operations.

None of this is available in Apache Cassandra. In Cassandra, all of the data is stored in one single table (hence the reference in the title). The rows of this table consist of columns (just like in relational databases), but these columns are organized into groups called column families. The column families are fixed and have to predefined. Within each column family, the columns for each row can vary. This allows each row to have a dynamic structure, something not possible with SQL databases.

Each row has a unique key within Cassandra. The data within the column families for each row is stored with respect to this key. Thus, in essence, Cassandra is a key-value data store. The column families and columns, however, allow complex structures to be stored. We will explore how to make use of this unique data model in a minute.

Download and installation

Apache Cassandra can be downloaded from The software is written in Java and requires at least Java 1.6 to execute properly. Make sure the Java installation is configured properly before attempting to run Cassandra. Detailed configuration instructions can be found at This page also lists the basic configuration options that should be set for proper functioning. In case anything goes wrong or you are unable to run the Cassandra server, scroll down to the section “If something goes wrong”. The quick tips might help to identify and resolve the problem.

Tour de Cassandra

Once Cassandra is up and running, we can use the Cassandra CLI (Command Line Interface) tool to interact with any Cassandra node. In this case, we will connect with the node running on the localhost (assuming a basic configuration is in place). To execute the CLI tool, type in the command line:

The output will be something like:

The installed version will be indicated in the output. At this point, the CLI tool is running but it is not connected to any Cassandra node. To connect to the node at localhost, type:

If the output contains text starting with “Connected to:” then the connection is successful. We can now begin using Cassandra:

This command creates a keyspace within the Cassandra cluster (in our case, only one node on the localhost). Keyspaces are analogous to a database or schema from the RDBMs world. They are used to group together keys belonging to one particular application. One Cassandra cluster can have multiple keyspaces, just like a single RDBM server can have multiple databases (schema) stored.

Our command above will create the keyspace ‘TEST’. We can now start working with this keyspace. Before we can do that, we have to authenticate ourselves for using the keyspace:

Since we did not specify any username/password combination for accessing the keyspace, we can use it directly. In case username/password combination was required, the format of the ‘use’ command is:

Next, we need to create a column family to use for storing data:

The command above needs some explanation: ‘create’ is the command for creating anything in Cassandra. In this case we are creating a ‘column family’, which is specified next. The = pairs after the keyword ‘with’ are used to specify additional attributes for creation. We have specified two: comparator and default_validation_class. ‘comparator’ specifies the validator to use to validate and compare column names. Similarly, ‘default_validation_class’ specifies the validator to use to validate and compare column values.

Once the column family has been created, we can start storing data in it. To do this, we use the ‘set’ command:

Here ‘Users’ specifies the column family we just created – 1000 is the row key and name is the column name. After the equals sign (=) the column value is specified. Note that in Cassandra the columns within a column family are sparse, meaning an arbitrary number of columns (with any name) can be created. Thus, if we want to store the age of the user as well we can use:

To retrieve the data we just stored, we can use:

This will output something like (in our case):

For each column within a row Cassandra stores the column name, value and timestamp. The name and value are what is specified when storing data, whereas timestamp is used for internal working of the Cassandra cluster.


Apache Cassandra is a modern database technology that is designed to solve the challenges associated with massive scalability. It offers a unique data model that is based upon one big table containing multiple column families, with each row having an arbitrary number of columns within each column family. In this article we got a bird’s eye view of Cassandra. In later articles, we will cover data modelling, programming and administration of Cassandra in more detail.

Safari Books Online has the content you need

Below are some Cassandra books to help you develop applications, or you can check out all of the Cassandra books and training videos available from Safari Books Online. You can browse the content in preview mode or you can gain access to more information with a free trial or subscription to Safari Books Online.

The rising popularity of Apache Cassandra rests on its ability to handle very large data sets that include hundreds of terabytes — and that’s why this distributed database has been chosen by organizations such as Facebook, Twitter, Digg, and Rackspace. With Cassandra: The Definitive Guide, you’ll get all the details and practical examples you need to understand Cassandra’s non-relational database design and put it to work in a production environment.
Apache Cassandra is a fault-tolerant, distributed data store which offers linear scalability allowing it to be a storage platform for large high volume websites. Cassandra High Performance Cookbook provides detailed recipes that describe how to use the features of Cassandra and improve its performance. Recipes cover topics ranging from setting up Cassandra for the first time to complex multiple data center installations. The book also describes how to monitor and do capacity planning to ensure it is performing at a high level.
Professional NoSQL is a comprehensive hands-on guide to the fundamental concepts and practical solutions for getting you ready to use NoSQL databases. Expert author Shashank Tiwari begins with a helpful introduction on the subject of NoSQL, explains its characteristics and typical uses, and looks at where it fits in the application stack. Unique insights help you choose which NoSQL solutions are best for solving your specific data storage needs.

About the authors

Salman Ul Haq is a techpreneur, co-founder and CEO of TunaCode, Inc., a startup that delivers GPU-accelerated computing solutions to time-critical application domains. He holds a degree is Computer Systems Engineering. His current focus is on delivering the right solution for cloud security. He can be reached at
Shaneeb Kamran is a Computer Engineer from one of the leading universities of Pakistan. His programming journey started at the age of 12 and ever since he has dabbled himself in every new and shiny software technology he could get his hands on. He is currently involved in a startup that is working on cloud computing products.

Tags: Apache Cassandra, mySQL, NoSQL, Oracle, PlanetCassandra,

Comments are closed.