Posted on by & filed under Content - Highlights and Reviews, Mobile Development, Programming & Development, Web Development.

Apache Cassandra is a powerful NoSQL database that can handle massive scalability needs and at the same time guarantee fault tolerance. Cassandra offers a unique data storage model that was explored in previous articles (try this article). Also, we briefly covered the Cassandra CLI tool (try this article) that can be used to interact with Cassandra and perform queries using the Cassandra Query Language (CQL). In this article, we will cover Cassandra from an administration point of view. We will see how to perform common administrative tasks using the NodeTool that ships by default with Cassandra.

The NodeTool

An Apache Cassandra setup runs in the form of a cluster composed of multiple nodes connected together and running in parallel. Each of these nodes is identical to each other in functionality and hence there is no single “master” node. The absence of the latter means that Cassandra has no single point of failure meaning that there is no single node whose failure would result in a definite failure of Cassandra system as a whole. This property gives Cassandra its fault tolerance abilities.

The NodeTool is a useful utility that ships by default with the Cassandra installation and allows an administrator to perform various administrative tasks on a Cassandra node (or the system as a whole via the node). The requirements for running this utility are the same as those for Cassandra; at least Java version 1.6 or later is required and the path of the ‘bin’ directory of Cassandra should be present in the PATH environment variable. To run the utility, type:

If the output is a (long) list of commands with short descriptions of each then the nodetool is ready to be used. Otherwise, you may need to troubleshoot why the node tool can’t be executed. Common reasons include corrupt installations or the PATH variable not correctly set. Try reinstalling Cassandra and attempt to run the tool again. If still unsuccessful, try Googling around for the solution and chances are you will be able to find one in one of the community forums for Cassandra.

Common operations

If a Cassandra node is running on your local system, try connecting to it. The ‘-h’ command line flag can be used to specify the host and port. For example:

This will still produce a long list of commands. What we need to do is provide a “command” for the NodeTool to execute. We can try the “info” command first:

This will output a few key : value pairs that show some internal information about the node. These include the token, uptime and heap memory information.

Next we can try the “ring” command. The ring command prints the token ring information. The token ring is responsible for managing the partitioning of data within the Cassandra cluster. This information is critical if a cluster is facing data consistency issues. To view this information, type:

This will show all the nodes that are involved in the ‘ring’ as well as show the tokens that are assigned to each one of them. It will also show the status of each of the nodes. In case one of the nodes is down, then repairing operations need to be performed to bring the node back up and running again.

Quick repairs

The NodeTool provides a quick command that can be used to perform repairing quickly and easily:

The “repair” command detects and repairs all inconsistencies across the replicas of a given range of data. It should be run at regular intervals during normal operation to keep the data consistent. Also, its use during node recovery situations can be very useful. Keep in mind, however, that the repair operation is an expensive one both in terms of disk and CPU usage. Use it with caution when running the repair operation on multiple nodes at the same time.

Conclusion

In this article, we briefly covered the basic administrative tasks associated with Apache Cassandra. Specifically we covered the NodeTool that ships by default with Cassandra which can be used to perform common administrative tasks of monitoring and repairing quickly and easily.

Safari Books Online has the content you need

Below are some Cassandra books to help you develop applications, or you can check out all of the Cassandra books and training videos available from Safari Books Online. You can browse the content in preview mode or you can gain access to more information with a free trial or subscription to Safari Books Online.

The rising popularity of Apache Cassandra rests on its ability to handle very large data sets that include hundreds of terabytes — and that’s why this distributed database has been chosen by organizations such as Facebook, Twitter, Digg, and Rackspace. With Cassandra: The Definitive Guide, you’ll get all the details and practical examples you need to understand Cassandra’s non-relational database design and put it to work in a production environment.
Apache Cassandra is a fault-tolerant, distributed data store which offers linear scalability allowing it to be a storage platform for large high volume websites. Cassandra High Performance Cookbook provides detailed recipes that describe how to use the features of Cassandra and improve its performance. Recipes cover topics ranging from setting up Cassandra for the first time to complex multiple data center installations. The recipe format presents the information in a concise actionable form. The book describes in detail how features of Cassandra can be tuned and what the possible effects of tuning can be. Recipes include how to access data stored in Cassandra and use third party tools to help you out. The book also describes how to monitor and do capacity planning to ensure it is performing at a high level. Towards the end, it takes you through the use of libraries and third party applications with Cassandra and Cassandra integration with Hadoop
NoSQL databases are an efficient and powerful tool for storing and manipulating vast quantities of data. Most NoSQL databases scale well as data grows. In addition, they are often malleable and flexible enough to accommodate semi-structured and sparse data sets. Professional NoSQL is a comprehensive hands-on guide to the fundamental concepts and practical solutions for getting you ready to use NoSQL databases. Expert author Shashank Tiwari begins with a helpful introduction on the subject of NoSQL, explains its characteristics and typical uses, and looks at where it fits in the application stack. Unique insights help you choose which NoSQL solutions are best for solving your specific data storage needs.

About the authors

Salman Ul Haq is a techpreneur, co-founder and CEO of TunaCode, Inc., a startup that delivers GPU-accelerated computing solutions to time-critical application domains. He holds a degree is Computer Systems Engineering. His current focus is on delivering the right solution for cloud security. He can be reached at salman@tunacode.com.
Shaneeb Kamran is a Computer Engineer from one of the leading universities of Pakistan. His programming journey started at the age of 12 and ever since he has dabbled himself in every new and shiny software technology he could get his hands on. He is currently involved in a startup that is working on cloud computing products.

Tags: Administrating, Apache Cassandra, NodeTool, NoSQL, PlanetCassandra,

Comments are closed.