Mastering Apache Cassandra - Second Edition

Book description

Build, manage, and configure high-performing, reliable NoSQL database for your application with Cassandra

In Detail

With ever increasing rates of data creation comes the demand to store data as fast and reliably as possible, a demand met by modern databases such as Cassandra. Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Through this practical guide, you will program pragmatically and understand completely the power of Cassandra. Starting with a brief recap of the basics to get everyone up and running, you will move on to deploy and monitor a production setup, dive under the hood, and optimize and integrate it with other software.

You will explore the integration and interaction of Cassandra components, and explore great new features such as CQL3, vnodes, lightweight transactions, and triggers. Finally, by learning Hadoop and Pig, you will be able to analyze your big data.

What You Will Learn

  • Write programs using Cassandra's features more efficiently
  • Get the most out of a given infrastructure, improve performance, and tweak JVM
  • Use CQL3 in your application, which makes working with Cassandra more simple
  • Configure Cassandra and fine-tune its parameters depending on your needs
  • Set up a cluster and learn how to scale it
  • Monitor Cassandra cluster in different ways
  • Use Hadoop and other big data processing tools with Cassandra

Table of contents

  1. Mastering Apache Cassandra Second Edition
    1. Table of Contents
    2. Mastering Apache Cassandra Second Edition
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Quick Start
      1. Introduction to Cassandra
        1. A distributed database
        2. High availability
        3. Replication
        4. Multiple data centers
      2. A brief introduction to a data model
      3. Installing Cassandra locally
      4. Cassandra in action
        1. Modeling data
        2. Writing code
          1. Setting up
          2. Inserting records
          3. Retrieving data
        3. Writing your application
          1. Getting the connection
          2. Executing queries
          3. Object mapping
      5. Summary
    9. 2. Cassandra Architecture
      1. Problems in the RDBMS world
      2. Enter NoSQL
        1. The CAP theorem
          1. Consistency
          2. Availability
          3. Partition-tolerance
        2. The significance of the CAP theorem
      3. Cassandra
      4. Understanding the architecture of Cassandra
        1. Ring representation
          1. Virtual nodes
        2. How Cassandra works
          1. Write in action
          2. Read in action
        3. The components of Cassandra
          1. The messaging service
          2. Gossip
          3. Failure detection
            1. Gossip and failure detection
          4. Partitioner
          5. Replication
            1. The notorious R + W > N inequality
          6. LSM tree
          7. Commit log
          8. MemTable
          9. SSTable
            1. The bloom filter
            2. Index files
            3. Data files
          10. Compaction
          11. Tombstones
          12. Hinted handoff
          13. Read repair and anti-entropy
            1. Merkle tree
      5. Summary
    10. 3. Effective CQL
      1. The Cassandra data model
        1. The counter column (cell)
        2. The expiring cell
        3. The column family
        4. Keyspaces
        5. Data types
          1. The primary index
        6. CQL3
          1. Creating a keyspace
            1. SimpleStrategy
            2. NetworkTopologyStrategy
          2. Altering a keyspace
          3. Creating a table
            1. Table properties
          4. Altering a table
            1. Adding a column
            2. Renaming a column
            3. Changing the data type
            4. Dropping a column
            5. Updating the table properties
          5. Dropping a table
          6. Creating an index
          7. Dropping an index
          8. Creating a data type
          9. Altering a custom type
          10. Dropping a custom type
          11. Creating triggers
          12. Dropping a trigger
          13. Creating a user
          14. Altering a user
          15. Dropping a user
        7. The granting permission
          1. Revoking permission using REVOKE
          2. Inserting data
            1. Collections in CQL
            2. Lists
            3. Sets
            4. Maps
          3. Lightweight transactions
          4. Updating a row
          5. Deleting a row
          6. Executing the BATCH statement
          7. Other CQL commands
            1. USE
            2. TRUNCATE
            3. LIST USERS
            4. LIST PERMISSIONS
      2. CQL shell commands
            1. DESCRIBE
            2. TRACING
            3. CONSISTENCY
            4. COPY
            5. CAPTURE
            6. ASSUME
            7. SOURCE
            8. SHOW
            9. EXIT
      3. Summary
    11. 4. Deploying a Cluster
      1. Evaluating requirements
        1. Hard disk capacity
          1. RAM
          2. CPU
          3. Is node a server?
          4. Network
      2. System configurations
        1. Optimizing user limits
        2. Swapping memory
        3. Clock synchronization
        4. Disk readahead
      3. The required software
        1. Installing Oracle Java 7
          1. RHEL and CentOS systems
          2. Debian and Ubuntu systems
        2. Installing the Java Native Access library
      4. Installing Cassandra
        1. Installing from a tarball
        2. Installing from ASFRepository for Debian or Ubuntu
        3. Anatomy of the installation
          1. Cassandra binaries
          2. Configuration files
            1. Setting up data and commitlog directories
      5. Configuring a Cassandra cluster
        1. The cluster name
        2. The seed node
          1. Listen, broadcast, and RPC addresses
        3. num_tokens versus initial_token
        4. num_tokens
        5. initial_token
        6. Partitioners
          1. The Random partitioner
          2. The Byte-ordered partitioner
          3. The Mumur3 partitioner
        7. Snitches
          1. SimpleSnitch
          2. PropertyFileSnitch
          3. GossipingPropertyFileSnitch
          4. RackInferringSnitch
          5. EC2Snitch
          6. EC2MultiRegionSnitch
        8. Replica placement strategies
          1. SimpleStrategy
          2. NetworkTopologyStrategy
            1. Multiple data center setups
        9. Launching a cluster with a script
        10. Creating a keyspace
      6. Authorization and authentication
      7. Summary
    12. 5. Performance Tuning
      1. Stress testing
        1. Database schema
        2. Data distribution
        3. Write pattern
        4. Read queries
      2. Performance tuning
        1. Write performance
        2. Read performance
          1. Choosing the right compaction strategy
          2. Size-tiered compaction strategy
          3. Leveled compaction
          4. Row cache
          5. Key cache
          6. Cache settings
          7. Enabling compression
          8. Tuning the bloom filter
        3. More tuning via cassandra.yaml
          1. commitlog_sync
          2. column_index_size_in_kb
          3. commitlog_total_space_in_mb
        4. Tweaking JVM
          1. Java heap
          2. Garbage collection
          3. Other JVM options
        5. Scaling horizontally and vertically
        6. Network
      3. Summary
    13. 6. Managing a Cluster – Scaling, Node Repair, and Backup
      1. Scaling
        1. Adding nodes to a cluster
          1. Adding new nodes in vnode-enabled clusters
          2. Adding a new node to a cluster without vnodes
        2. Removing nodes from a cluster
          1. Removing a live node
          2. Removing a dead node
      2. Replacing a node
      3. Backup and restoration
        1. Using the Cassandra bulk loader to restore the data
      4. Load balancing
      5. DataStax OpsCenter – managing large clusters
      6. Summary
    14. 7. Monitoring
      1. Cassandra's JMX interface
        1. Accessing MBeans using JConsole
      2. Cassandra's nodetool utility
        1. Monitoring with nodetool
          1. cfstats
          2. netstats
          3. status
          4. ring and describering
          5. tpstats
          6. compactionstats
          7. info
        2. Managing administration with nodetool
          1. drain
          2. decommission
          3. removenode
          4. move
          5. repair
          6. upgradesstable
          7. snapshot
      3. DataStax OpsCenter
        1. The OpsCenter features
        2. Installing OpsCenter and an agent
        3. Prerequisites
          1. Running a Cassandra cluster
          2. Installing OpsCenter from tarball
          3. Setting up an OpsCenter agent
        4. Monitoring and administrating with OpsCenter
        5. Other features of OpsCenter
      4. Nagios – monitoring and notification
        1. Installing Nagios
          1. Prerequisites
          2. Preparation
          3. Installation
            1. Installing Nagios
            2. Configuring Apache httpd
            3. Installing Nagios plugins
            4. Setting up Nagios as a service
          4. Nagios plugins
            1. Nagios plugins for Cassandra
              1. Executing remote plugins via the NRPE plugin
              2. Installing NRPE on host machines
              3. Installing the NRPE plugin on a Nagios machine
            2. Setting up things to monitor
            3. Monitoring and notification using Nagios
      5. Cassandra log
        1. Enabling Java options for GC logging
      6. Troubleshooting
        1. High CPU usage
        2. High memory usage
        3. Hotspots
        4. Open JDK's erratic behavior
        5. Disk performance
        6. Slow snapshots
        7. Getting help from the mailing list
      7. Summary
    15. 8. Integration with Hadoop
      1. Using Hadoop
      2. Hadoop and Cassandra
        1. Introduction to Hadoop
          1. HDFS
          2. Data management
            1. NameNode
            2. DataNodes
          3. Hadoop MapReduce
            1. JobTracker
            2. TaskTracker
          4. Reliability of data and processes in Hadoop
        2. Setting up local Hadoop
        3. Testing the installation
      3. Cassandra with Hadoop MapReduce
        1. Preparing Cassandra for Hadoop
        2. ColumnFamilyInputFormat
        3. ColumnFamilyOutputFormat
        4. CqlOutputFormat and CqlInputFormat
        5. ConfigHelper
          1. Wide row support
          2. Bulk loading
          3. Secondary index support
      4. Cassandra and Hadoop in action
        1. Executing, debugging, monitoring, and looking at results
      5. Hadoop in a Cassandra cluster
        1. Cassandra filesystem
      6. Integration with Pig
        1. Installing Pig
        2. Integrating Pig and Cassandra
        3. Integration with other analytical tools
      7. Summary
    16. Index

Product information

  • Title: Mastering Apache Cassandra - Second Edition
  • Author(s): Nishant Neeraj
  • Release date: March 2015
  • Publisher(s): Packt Publishing
  • ISBN: 9781784392611