You are previewing Cassandra: The Definitive Guide, 2nd Edition.
O'Reilly logo
Cassandra: The Definitive Guide, 2nd Edition

Book Description

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition—updated for Cassandra 3.0—provides the technical details and practical examples you need to put this database to work in a production environment.

Table of Contents

  1. Foreword
  2. Foreword
  3. Preface
    1. Why Apache Cassandra?
    2. Is This Book for You?
    3. What’s in This Book?
      1. New for the Second Edition
    4. Conventions Used in This Book
    5. Using Code Examples
    6. Safari® Books Online
    7. How to Contact Us
    8. Acknowledgments
  4. 1. Beyond Relational Databases
    1. What’s Wrong with Relational Databases?
    2. A Quick Review of Relational Databases
      1. RDBMSs: The Awesome and the Not-So-Much
    3. Web Scale
    4. The Rise of NoSQL
    5. Summary
  5. 2. Introducing Cassandra
    1. The Cassandra Elevator Pitch
      1. Cassandra in 50 Words or Less
      2. Distributed and Decentralized
      3. Elastic Scalability
      4. High Availability and Fault Tolerance
      5. Tuneable Consistency
      6. Brewer’s CAP Theorem
      7. Row-Oriented
      8. High Performance
    2. Where Did Cassandra Come From?
      1. Release History
    3. Is Cassandra a Good Fit for My Project?
      1. Large Deployments
      2. Lots of Writes, Statistics, and Analysis
      3. Geographical Distribution
      4. Evolving Applications
    4. Getting Involved
    5. Summary
  6. 3. Installing Cassandra
    1. Installing the Apache Distribution
      1. Extracting the Download
      2. What’s In There?
    2. Building from Source
      1. Additional Build Targets
    3. Running Cassandra
      1. On Windows
      2. On Linux
      3. Starting the Server
      4. Stopping Cassandra
    4. Other Cassandra Distributions
    5. Running the CQL Shell
    6. Basic cqlsh Commands
      1. cqlsh Help
      2. Describing the Environment in cqlsh
      3. Creating a Keyspace and Table in cqlsh
      4. Writing and Reading Data in cqlsh
    7. Summary
  7. 4. The Cassandra Query Language
    1. The Relational Data Model
    2. Cassandra’s Data Model
      1. Clusters
      2. Keyspaces
      3. Tables
      4. Columns
    3. CQL Types
      1. Numeric Data Types
      2. Textual Data Types
      3. Time and Identity Data Types
      4. Other Simple Data Types
      5. Collections
      6. User-Defined Types
    4. Secondary Indexes
    5. Summary
  8. 5. Data Modeling
    1. Conceptual Data Modeling
    2. RDBMS Design
      1. Design Differences Between RDBMS and Cassandra
    3. Defining Application Queries
    4. Logical Data Modeling
      1. Hotel Logical Data Model
      2. Reservation Logical Data Model
    5. Physical Data Modeling
      1. Hotel Physical Data Model
      2. Reservation Physical Data Model
      3. Materialized Views
    6. Evaluating and Refining
      1. Calculating Partition Size
      2. Calculating Size on Disk
      3. Breaking Up Large Partitions
    7. Defining Database Schema
      1. DataStax DevCenter
    8. Summary
  9. 6. The Cassandra Architecture
    1. Data Centers and Racks
    2. Gossip and Failure Detection
    3. Snitches
    4. Rings and Tokens
    5. Virtual Nodes
    6. Partitioners
    7. Replication Strategies
    8. Consistency Levels
    9. Queries and Coordinator Nodes
    10. Memtables, SSTables, and Commit Logs
    11. Caching
    12. Hinted Handoff
    13. Lightweight Transactions and Paxos
    14. Tombstones
    15. Bloom Filters
    16. Compaction
    17. Anti-Entropy, Repair, and Merkle Trees
    18. Staged Event-Driven Architecture (SEDA)
    19. Managers and Services
      1. Cassandra Daemon
      2. Storage Engine
      3. Storage Service
      4. Storage Proxy
      5. Messaging Service
      6. Stream Manager
      7. CQL Native Transport Server
    20. System Keyspaces
    21. Summary
  10. 7. Configuring Cassandra
    1. Cassandra Cluster Manager
    2. Creating a Cluster
    3. Seed Nodes
    4. Partitioners
      1. Murmur3 Partitioner
      2. Random Partitioner
      3. Order-Preserving Partitioner
      4. ByteOrderedPartitioner
    5. Snitches
      1. Simple Snitch
      2. Property File Snitch
      3. Gossiping Property File Snitch
      4. Rack Inferring Snitch
      5. Cloud Snitches
      6. Dynamic Snitch
    6. Node Configuration
      1. Tokens and Virtual Nodes
      2. Network Interfaces
      3. Data Storage
      4. Startup and JVM Settings
    7. Adding Nodes to a Cluster
    8. Dynamic Ring Participation
    9. Replication Strategies
      1. SimpleStrategy
      2. NetworkTopologyStrategy
      3. Changing the Replication Factor
    10. Summary
  11. 8. Clients
    1. Hector, Astyanax, and Other Legacy Clients
    2. DataStax Java Driver
      1. Development Environment Configuration
      2. Clusters and Contact Points
      3. Sessions and Connection Pooling
      4. Statements
      5. Policies
      6. Metadata
      7. Debugging and Monitoring
    3. DataStax Python Driver
    4. DataStax Node.js Driver
    5. DataStax Ruby Driver
    6. DataStax C# Driver
    7. DataStax C/C++ Driver
    8. DataStax PHP Driver
    9. Summary
  12. 9. Reading and Writing Data
    1. Writing
      1. Write Consistency Levels
      2. The Cassandra Write Path
      3. Writing Files to Disk
      4. Lightweight Transactions
      5. Batches
    2. Reading
      1. Read Consistency Levels
      2. The Cassandra Read Path
      3. Read Repair
      4. Range Queries, Ordering and Filtering
      5. Functions and Aggregates
      6. Paging
      7. Speculative Retry
    3. Deleting
    4. Summary
  13. 10. Monitoring
    1. Logging
      1. Tailing
      2. Examining Log Files
    2. Monitoring Cassandra with JMX
      1. Connecting to Cassandra via JConsole
      2. Overview of MBeans
    3. Cassandra’s MBeans
      1. Database MBeans
      2. Networking MBeans
      3. Metrics MBeans
      4. Threading MBeans
      5. Service MBeans
      6. Security MBeans
    4. Monitoring with nodetool
      1. Getting Cluster Information
      2. Getting Statistics
    5. Summary
  14. 11. Maintenance
    1. Health Check
    2. Basic Maintenance
      1. Flush
      2. Cleanup
      3. Repair
      4. Rebuilding Indexes
      5. Moving Tokens
    3. Adding Nodes
      1. Adding Nodes to an Existing Data Center
      2. Adding a Data Center to a Cluster
    4. Handling Node Failure
      1. Repairing Nodes
      2. Replacing Nodes
      3. Removing Nodes
    5. Upgrading Cassandra
    6. Backup and Recovery
      1. Taking a Snapshot
      2. Clearing a Snapshot
      3. Enabling Incremental Backup
      4. Restoring from Snapshot
    7. SSTable Utilities
    8. Maintenance Tools
      1. DataStax OpsCenter
      2. Netflix Priam
    9. Summary
  15. 12. Performance Tuning
    1. Managing Performance
      1. Setting Performance Goals
      2. Monitoring Performance
      3. Analyzing Performance Issues
      4. Tracing
      5. Tuning Methodology
    2. Caching
      1. Key Cache
      2. Row Cache
      3. Counter Cache
      4. Saved Cache Settings
    3. Memtables
    4. Commit Logs
    5. SSTables
    6. Hinted Handoff
    7. Compaction
    8. Concurrency and Threading
    9. Networking and Timeouts
    10. JVM Settings
      1. Memory
      2. Garbage Collection
    11. Using cassandra-stress
    12. Summary
  16. 13. Security
    1. Authentication and Authorization
      1. Password Authenticator
      2. Using CassandraAuthorizer
      3. Role-Based Access Control
    2. Encryption
      1. SSL, TLS, and Certificates
      2. Node-to-Node Encryption
      3. Client-to-Node Encryption
    3. JMX Security
      1. Securing JMX Access
      2. Security MBeans
    4. Summary
  17. 14. Deploying and Integrating
    1. Planning a Cluster Deployment
      1. Sizing Your Cluster
      2. Selecting Instances
      3. Storage
      4. Network
    2. Cloud Deployment
      1. Amazon Web Services
      2. Microsoft Azure
      3. Google Cloud Platform
    3. Integrations
      1. Apache Lucene, SOLR, and Elasticsearch
      2. Apache Hadoop
      3. Apache Spark
    4. Summary
  18. Index