The recent popularity of Social Media sites such as Facebook, Twitter and Google+ has brought new challenges for popular websites on the Internet. The ability of certain links to go “viral” resulting in the associated websites to experience the slashdot effect (highly unpredictable spikes in traffic that last for brief periods of time) means that such websites should always be prepared to handle a large volume of traffic. This is the subject of scalability and in this article we discuss scalability with regards to one core component of web applications, which is that of databases. Specifically we will discuss some popular NoSQL databases and how their unique features can be used to build highly scalable websites.
Traditional databases are referred to as SQL-based databases since their underlying model is based on the strict model of relational algebra. The rigidness of this model combined with the fact that related tables need to be linked together for complex queries, introduce certain restrictions in Relational Database Management Systems (RDBMS) that limit the scalability of such databases. To deal with this problem, a new breed of databases have been created that are referred to as NoSQL (Not only SQL). This new class of databases each offer a unique combination of features that are geared towards solving a particular kind of problem. We discuss in this article three popular NoSQL databases each representing a different genre within the NoSQL ecosystem. The databases discuss are MongoDB, Redis and Apache Cassandara. For each database, we will discuss the unique data storage model offered as well as how you can use that database in your own web applications to achieve scalability as well as optimize certain kinds of operations.
The first in the list of databases to be discussed is MongoDB. MongoDB is an Open Source document-oriented NoSQL database. Data is stored in the form of ‘documents’ which are JSON-like objects. Similar documents are grouped together in collections. If we compare the MongoDB model to SQL databases then collections are roughly equivalent to tables and documents are the rows. The advantage of this model comes from the fact that documents can contain other nested documents within them and this allows very complex document structures to be created. Another differentiating factor is that documents cannot be linked or related at the database level. Any relation between documents has to be handled at the application level. This allows documents to be distributed across multiple nodes and a router node, which sits between the application and the cluster of database nodes, is responsible for finding and returning the documents to the application.
MongoDB is most suitable for cases where scalability and dynamic schema support is required. If your website experiences a high amount of traffic and your existing database’s scalability is failing or you find yourself changing your data schema all too often, then MongoDB is the place to look. The software can be downloaded from the official website http://www.mongodb.org/downloads. Installers are available for Windows and Mac OS X platforms. For Linux based distributions, using the distro’s package manager would be a better option. Under Ubuntu, the command to install the package is
sudo apt-get install mongodb
Once installed the mongo client program can be used to connect to the database server and issue commands. A quick tutorial covering basic operations can be found at http://www.mongodb.org/display/DOCS/Tutorial.
Next up is the Redis database. Sponsored by VMware, Redis is an open source, networked, in-memory key-value store. In a key-value model, data is stored in the form of a dictionary. Each item has a unique key and a value is stored corresponding to that key. Values can be of multiple types including strings, hashes and lists. This support for multiple data types allows complex data structures to be stored in the Redis database. Also being an in-memory database means that it can achieve a very high I/O throughput rate, however it has not been designed to store huge amounts of data. In fact, the most appropriate use case scenarios for Redis are as a caching datastore to speed up and optimize certain operations to reduce load on the backend database server.
The official website of Redis is located at http://redis.io/ and it can be downloaded from http://redis.io/download. For Linux based distribution, using the default package manager is a better option. Under Ubuntu, the command to install Redis is:
sudo apt-get install redis-server
The Redis client can then be opened by typing redis in the command prompt and this will open up connection with the Redis server. Redis has a relatively small set of commands and a very comprehensive documentation is available at http://redis.io/documentation.
The last database that will be covered is the Apache Cassandra database. Started by Facebook and then open-sourced and donated to the Apache Foundation, Cassandra is a schema-less, distributed database management system. Its data model is inspired from Google’s BigTable project in which keys map to multiple values with values grouped together in column families. So each key identifies a row of variable number of elements. Each of these elements is part of a column and related columns are grouped together in a column family. There is no support for SQL (or joining of rows) and any combination or aggregation of data has to be done at the application level. A key strength of Cassandra is its massive scalability thanks to its decentralized, distributed architecture and built-in fault tolerance mechanisms. These features make Cassandra an ideal choice for applications requiring storage of humongous amounts of data, such as for Big Data applications.
To download Cassandra, navigate to the official website http://cassandra.apache.org/download/. Installation of Cassandra is a bit involved process and its recommended that you follow the guide available at http://wiki.apache.org/cassandra/GettingStarted. Once installed the Cassandra CLI can be used to connect to the database and perform operations.
This completes our discussion of NoSQL databases. The decision to switch to a NoSQL database and the picking the right one depends on the particular needs of the application. The summary given above for the three major NoSQL databases will help you make an informed decision on what solution to implement.
Safari Books Online has the content you need
Take advantage of these NoSQL resources in Safari Books Online:
|The rising popularity of Apache Cassandra rests on its ability to handle very large data sets that include hundreds of terabytes — and that’s why this distributed database has been chosen by organizations such as Facebook, Twitter, Digg, and Rackspace. With this hands-on guide, you’ll get all the details and practical examples you need to understand Cassandra’s non-relational database design and put it to work in a production environment.|
|MongoDB in Action is a comprehensive guide to MongoDB for application developers. The book begins by explaining what makes MongoDB unique and describing its ideal use cases. A series of tutorials designed for MongoDB mastery then leads into detailed examples for leveraging MongoDB in e-commerce, social networking, analytics, and other common applications.|
|Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets. This book will provide developers with problem and solutions in our useful cookbook style. This is an example-driven ebook.|
About the authors
|Salman Ul Haq is a techpreneur, co-founder and CEO of TunaCode, Inc., a startup that delivers GPU-accelerated computing solutions to time-critical application domains. He holds a degree is Computer Systems Engineering. His current focus is on delivering the right solution for cloud security. He can be reached at firstname.lastname@example.org.|
|Shaneeb Kamran is a Computer Engineer from one of the leading universities of Pakistan. His programming journey started at the age of 12 and ever since he has dabbled himself in every new and shiny software technology he could get his hands on. He is currently involved in a startup that is working on cloud computing products.|