You are previewing Mastering Apache Cassandra - Second Edition.
O'Reilly logo
Mastering Apache Cassandra - Second Edition

Book Description

Build, manage, and configure high-performing, reliable NoSQL database for your application with Cassandra

In Detail

With ever increasing rates of data creation comes the demand to store data as fast and reliably as possible, a demand met by modern databases such as Cassandra. Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Through this practical guide, you will program pragmatically and understand completely the power of Cassandra. Starting with a brief recap of the basics to get everyone up and running, you will move on to deploy and monitor a production setup, dive under the hood, and optimize and integrate it with other software.

You will explore the integration and interaction of Cassandra components, and explore great new features such as CQL3, vnodes, lightweight transactions, and triggers. Finally, by learning Hadoop and Pig, you will be able to analyze your big data.

What You Will Learn

  • Write programs using Cassandra's features more efficiently

  • Get the most out of a given infrastructure, improve performance, and tweak JVM

  • Use CQL3 in your application, which makes working with Cassandra more simple

  • Configure Cassandra and fine-tune its parameters depending on your needs

  • Set up a cluster and learn how to scale it

  • Monitor Cassandra cluster in different ways

  • Use Hadoop and other big data processing tools with Cassandra

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Mastering Apache Cassandra Second Edition
      1. Table of Contents
      2. Mastering Apache Cassandra Second Edition
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Quick Start
        1. Introduction to Cassandra
          1. A distributed database
          2. High availability
          3. Replication
          4. Multiple data centers
        2. A brief introduction to a data model
        3. Installing Cassandra locally
        4. Cassandra in action
          1. Modeling data
          2. Writing code
            1. Setting up
            2. Inserting records
            3. Retrieving data
          3. Writing your application
            1. Getting the connection
            2. Executing queries
            3. Object mapping
        5. Summary
      9. 2. Cassandra Architecture
        1. Problems in the RDBMS world
        2. Enter NoSQL
          1. The CAP theorem
            1. Consistency
            2. Availability
            3. Partition-tolerance
          2. The significance of the CAP theorem
        3. Cassandra
        4. Understanding the architecture of Cassandra
          1. Ring representation
            1. Virtual nodes
          2. How Cassandra works
            1. Write in action
            2. Read in action
          3. The components of Cassandra
            1. The messaging service
            2. Gossip
            3. Failure detection
              1. Gossip and failure detection
            4. Partitioner
            5. Replication
              1. The notorious R + W > N inequality
            6. LSM tree
            7. Commit log
            8. MemTable
            9. SSTable
              1. The bloom filter
              2. Index files
              3. Data files
            10. Compaction
            11. Tombstones
            12. Hinted handoff
            13. Read repair and anti-entropy
              1. Merkle tree
        5. Summary
      10. 3. Effective CQL
        1. The Cassandra data model
          1. The counter column (cell)
          2. The expiring cell
          3. The column family
          4. Keyspaces
          5. Data types
            1. The primary index
          6. CQL3
            1. Creating a keyspace
              1. SimpleStrategy
              2. NetworkTopologyStrategy
            2. Altering a keyspace
            3. Creating a table
              1. Table properties
            4. Altering a table
              1. Adding a column
              2. Renaming a column
              3. Changing the data type
              4. Dropping a column
              5. Updating the table properties
            5. Dropping a table
            6. Creating an index
            7. Dropping an index
            8. Creating a data type
            9. Altering a custom type
            10. Dropping a custom type
            11. Creating triggers
            12. Dropping a trigger
            13. Creating a user
            14. Altering a user
            15. Dropping a user
          7. The granting permission
            1. Revoking permission using REVOKE
            2. Inserting data
              1. Collections in CQL
              2. Lists
              3. Sets
              4. Maps
            3. Lightweight transactions
            4. Updating a row
            5. Deleting a row
            6. Executing the BATCH statement
            7. Other CQL commands
              1. USE
              2. TRUNCATE
              3. LIST USERS
              4. LIST PERMISSIONS
        2. CQL shell commands
              1. DESCRIBE
              2. TRACING
              3. CONSISTENCY
              4. COPY
              5. CAPTURE
              6. ASSUME
              7. SOURCE
              8. SHOW
              9. EXIT
        3. Summary
      11. 4. Deploying a Cluster
        1. Evaluating requirements
          1. Hard disk capacity
            1. RAM
            2. CPU
            3. Is node a server?
            4. Network
        2. System configurations
          1. Optimizing user limits
          2. Swapping memory
          3. Clock synchronization
          4. Disk readahead
        3. The required software
          1. Installing Oracle Java 7
            1. RHEL and CentOS systems
            2. Debian and Ubuntu systems
          2. Installing the Java Native Access library
        4. Installing Cassandra
          1. Installing from a tarball
          2. Installing from ASFRepository for Debian or Ubuntu
          3. Anatomy of the installation
            1. Cassandra binaries
            2. Configuration files
              1. Setting up data and commitlog directories
        5. Configuring a Cassandra cluster
          1. The cluster name
          2. The seed node
            1. Listen, broadcast, and RPC addresses
          3. num_tokens versus initial_token
          4. num_tokens
          5. initial_token
          6. Partitioners
            1. The Random partitioner
            2. The Byte-ordered partitioner
            3. The Mumur3 partitioner
          7. Snitches
            1. SimpleSnitch
            2. PropertyFileSnitch
            3. GossipingPropertyFileSnitch
            4. RackInferringSnitch
            5. EC2Snitch
            6. EC2MultiRegionSnitch
          8. Replica placement strategies
            1. SimpleStrategy
            2. NetworkTopologyStrategy
              1. Multiple data center setups
          9. Launching a cluster with a script
          10. Creating a keyspace
        6. Authorization and authentication
        7. Summary
      12. 5. Performance Tuning
        1. Stress testing
          1. Database schema
          2. Data distribution
          3. Write pattern
          4. Read queries
        2. Performance tuning
          1. Write performance
          2. Read performance
            1. Choosing the right compaction strategy
            2. Size-tiered compaction strategy
            3. Leveled compaction
            4. Row cache
            5. Key cache
            6. Cache settings
            7. Enabling compression
            8. Tuning the bloom filter
          3. More tuning via cassandra.yaml
            1. commitlog_sync
            2. column_index_size_in_kb
            3. commitlog_total_space_in_mb
          4. Tweaking JVM
            1. Java heap
            2. Garbage collection
            3. Other JVM options
          5. Scaling horizontally and vertically
          6. Network
        3. Summary
      13. 6. Managing a Cluster – Scaling, Node Repair, and Backup
        1. Scaling
          1. Adding nodes to a cluster
            1. Adding new nodes in vnode-enabled clusters
            2. Adding a new node to a cluster without vnodes
          2. Removing nodes from a cluster
            1. Removing a live node
            2. Removing a dead node
        2. Replacing a node
        3. Backup and restoration
          1. Using the Cassandra bulk loader to restore the data
        4. Load balancing
        5. DataStax OpsCenter – managing large clusters
        6. Summary
      14. 7. Monitoring
        1. Cassandra's JMX interface
          1. Accessing MBeans using JConsole
        2. Cassandra's nodetool utility
          1. Monitoring with nodetool
            1. cfstats
            2. netstats
            3. status
            4. ring and describering
            5. tpstats
            6. compactionstats
            7. info
          2. Managing administration with nodetool
            1. drain
            2. decommission
            3. removenode
            4. move
            5. repair
            6. upgradesstable
            7. snapshot
        3. DataStax OpsCenter
          1. The OpsCenter features
          2. Installing OpsCenter and an agent
          3. Prerequisites
            1. Running a Cassandra cluster
            2. Installing OpsCenter from tarball
            3. Setting up an OpsCenter agent
          4. Monitoring and administrating with OpsCenter
          5. Other features of OpsCenter
        4. Nagios – monitoring and notification
          1. Installing Nagios
            1. Prerequisites
            2. Preparation
            3. Installation
              1. Installing Nagios
              2. Configuring Apache httpd
              3. Installing Nagios plugins
              4. Setting up Nagios as a service
            4. Nagios plugins
              1. Nagios plugins for Cassandra
                1. Executing remote plugins via the NRPE plugin
                2. Installing NRPE on host machines
                3. Installing the NRPE plugin on a Nagios machine
              2. Setting up things to monitor
              3. Monitoring and notification using Nagios
        5. Cassandra log
          1. Enabling Java options for GC logging
        6. Troubleshooting
          1. High CPU usage
          2. High memory usage
          3. Hotspots
          4. Open JDK's erratic behavior
          5. Disk performance
          6. Slow snapshots
          7. Getting help from the mailing list
        7. Summary
      15. 8. Integration with Hadoop
        1. Using Hadoop
        2. Hadoop and Cassandra
          1. Introduction to Hadoop
            1. HDFS
            2. Data management
              1. NameNode
              2. DataNodes
            3. Hadoop MapReduce
              1. JobTracker
              2. TaskTracker
            4. Reliability of data and processes in Hadoop
          2. Setting up local Hadoop
          3. Testing the installation
        3. Cassandra with Hadoop MapReduce
          1. Preparing Cassandra for Hadoop
          2. ColumnFamilyInputFormat
          3. ColumnFamilyOutputFormat
          4. CqlOutputFormat and CqlInputFormat
          5. ConfigHelper
            1. Wide row support
            2. Bulk loading
            3. Secondary index support
        4. Cassandra and Hadoop in action
          1. Executing, debugging, monitoring, and looking at results
        5. Hadoop in a Cassandra cluster
          1. Cassandra filesystem
        6. Integration with Pig
          1. Installing Pig
          2. Integrating Pig and Cassandra
          3. Integration with other analytical tools
        7. Summary
      16. Index