You are previewing Cassandra: The Definitive Guide.

Cassandra: The Definitive Guide

Cover of Cassandra: The Definitive Guide by Eben Hewitt Published by O'Reilly Media, Inc.
  1. Cassandra: The Definitive Guide
  2. Dedication
  3. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  4. A Note Regarding Supplemental Files
  5. Foreword
  6. Preface
    1. Why Apache Cassandra?
    2. Is This Book for You?
    3. What’s in This Book?
    4. Finding Out More
    5. Conventions Used in This Book
    6. Using Code Examples
    7. Safari® Enabled
    8. How to Contact Us
    9. Acknowledgments
  7. 1. Introducing Cassandra
    1. What’s Wrong with Relational Databases?
    2. A Quick Review of Relational Databases
      1. RDBMS: The Awesome and the Not-So-Much
      2. Web Scale
    3. The Cassandra Elevator Pitch
      1. Cassandra in 50 Words or Less
      2. Distributed and Decentralized
      3. Elastic Scalability
      4. High Availability and Fault Tolerance
      5. Tuneable Consistency
      6. Brewer’s CAP Theorem
      7. Row-Oriented
      8. Schema-Free
      9. High Performance
    4. Where Did Cassandra Come From?
    5. Use Cases for Cassandra
      1. Large Deployments
      2. Lots of Writes, Statistics, and Analysis
      3. Geographical Distribution
      4. Evolving Applications
    6. Who Is Using Cassandra?
    7. Summary
  8. 2. Installing Cassandra
    1. Installing the Binary
      1. Extracting the Download
      2. What’s In There?
    2. Building from Source
      1. Additional Build Targets
      2. Building with Maven
    3. Running Cassandra
      1. On Windows
      2. On Linux
      3. Starting the Server
    4. Running the Command-Line Client Interface
    5. Basic CLI Commands
      1. Help
      2. Connecting to a Server
      3. Describing the Environment
      4. Creating a Keyspace and Column Family
      5. Writing and Reading Data
    6. Summary
  9. 3. The Cassandra Data Model
    1. The Relational Data Model
    2. A Simple Introduction
    3. Clusters
    4. Keyspaces
    5. Column Families
      1. Column Family Options
    6. Columns
      1. Wide Rows, Skinny Rows
      2. Column Sorting
    7. Super Columns
      1. Composite Keys
    8. Design Differences Between RDBMS and Cassandra
      1. No Query Language
      2. No Referential Integrity
      3. Secondary Indexes
      4. Sorting Is a Design Decision
      5. Denormalization
    9. Design Patterns
      1. Materialized View
      2. Valueless Column
      3. Aggregate Key
    10. Some Things to Keep in Mind
    11. Summary
  10. 4. Sample Application
    1. Data Design
    2. Hotel App RDBMS Design
    3. Hotel App Cassandra Design
    4. Hotel Application Code
      1. Creating the Database
      2. Data Structures
      3. Getting a Connection
      4. Prepopulating the Database
      5. The Search Application
    5. Twissandra
    6. Summary
  11. 5. The Cassandra Architecture
    1. System Keyspace
    2. Peer-to-Peer
    3. Gossip and Failure Detection
    4. Anti-Entropy and Read Repair
    5. Memtables, SSTables, and Commit Logs
    6. Hinted Handoff
    7. Compaction
    8. Bloom Filters
    9. Tombstones
    10. Staged Event-Driven Architecture (SEDA)
    11. Managers and Services
      1. Cassandra Daemon
      2. Storage Service
      3. Messaging Service
      4. Hinted Handoff Manager
    12. Summary
  12. 6. Configuring Cassandra
    1. Keyspaces
      1. Creating a Column Family
      2. Transitioning from 0.6 to 0.7
    2. Replicas
    3. Replica Placement Strategies
      1. Simple Strategy
      2. Old Network Topology Strategy
      3. Network Topology Strategy
    4. Replication Factor
      1. Increasing the Replication Factor
    5. Partitioners
      1. Random Partitioner
      2. Order-Preserving Partitioner
      3. Collating Order-Preserving Partitioner
      4. Byte-Ordered Partitioner
    6. Snitches
      1. Simple Snitch
      2. PropertyFileSnitch
    7. Creating a Cluster
      1. Changing the Cluster Name
      2. Adding Nodes to a Cluster
      3. Multiple Seed Nodes
    8. Dynamic Ring Participation
    9. Security
      1. Using SimpleAuthenticator
      2. Programmatic Authentication
      3. Using MD5 Encryption
      4. Providing Your Own Authentication
    10. Miscellaneous Settings
    11. Additional Tools
      1. Viewing Keys
      2. Importing Previous Configurations
    12. Summary
  13. 7. Reading and Writing Data
    1. Query Differences Between RDBMS and Cassandra
      1. No Update Query
      2. Record-Level Atomicity on Writes
      3. No Server-Side Transaction Support
      4. No Duplicate Keys
    2. Basic Write Properties
    3. Consistency Levels
    4. Basic Read Properties
    5. The API
      1. Ranges and Slices
    6. Setup and Inserting Data
    7. Using a Simple Get
    8. Seeding Some Values
    9. Slice Predicate
      1. Getting Particular Column Names with Get Slice
      2. Getting a Set of Columns with Slice Range
      3. Getting All Columns in a Row
    10. Get Range Slices
    11. Multiget Slice
    12. Deleting
    13. Batch Mutates
      1. Batch Deletes
      2. Range Ghosts
    14. Programmatically Defining Keyspaces and Column Families
    15. Summary
  14. 8. Clients
    1. Basic Client API
    2. Thrift
      1. Thrift Support for Java
      2. Exceptions
      3. Thrift Summary
    3. Avro
      1. Avro Ant Targets
      2. Avro Specification
      3. Avro Summary
    4. A Bit of Git
    5. Connecting Client Nodes
      1. Client List
      2. Round-Robin DNS
      3. Load Balancer
    6. Cassandra Web Console
    7. Hector (Java)
      1. Features
      2. The Hector API
    8. HectorSharp (C#)
    9. Chirper
    10. Chiton (Python)
    11. Pelops (Java)
    12. Kundera (Java ORM)
    13. Fauna (Ruby)
    14. Summary
  15. 9. Monitoring
    1. Logging
      1. Tailing
      2. General Tips
    2. Overview of JMX and MBeans
      1. MBeans
      2. Integrating JMX
    3. Interacting with Cassandra via JMX
    4. Cassandra’s MBeans
      1. org.apache.cassandra.concurrent
      2. org.apache.cassandra.db
      3. org.apache.cassandra.gms
      4. org.apache.cassandra.service
    5. Custom Cassandra MBeans
    6. Runtime Analysis Tools
      1. Heap Analysis with JMX and JHAT
      2. Detecting Thread Problems
    7. Health Check
    8. Summary
  16. 10. Maintenance
    1. Getting Ring Information
      1. Info
      2. Ring
    2. Getting Statistics
      1. Using cfstats
      2. Using tpstats
    3. Basic Maintenance
      1. Repair
      2. Flush
      3. Cleanup
    4. Snapshots
      1. Taking a Snapshot
      2. Clearing a Snapshot
    5. Load-Balancing the Cluster
      1. loadbalance and streams
    6. Decommissioning a Node
    7. Updating Nodes
      1. Removing Tokens
      2. Compaction Threshold
      3. Changing Column Families in a Working Cluster
    8. Summary
  17. 11. Performance Tuning
    1. Data Storage
    2. Reply Timeout
    3. Commit Logs
    4. Memtables
    5. Concurrency
    6. Caching
    7. Buffer Sizes
    8. Using the Python Stress Test
      1. Generating the Python Thrift Interfaces
      2. Running the Python Stress Test
    9. Startup and JVM Settings
      1. Tuning the JVM
    10. Summary
  18. 12. Integrating Hadoop
    1. What Is Hadoop?
    2. Working with MapReduce
      1. Cassandra Hadoop Source Package
    3. Running the Word Count Example
      1. Outputting Data to Cassandra
      2. Hadoop Streaming
    4. Tools Above MapReduce
      1. Pig
      2. Hive
    5. Cluster Configuration
    6. Use Cases
      1. Raptr.com: Keith Thornhill
      2. Imagini: Dave Gardner
    7. Summary
  19. A. The Nonrelational Landscape
    1. Nonrelational Databases
    2. Object Databases
    3. XML Databases
      1. SoftwareAG Tamino
      2. eXist
      3. Oracle Berkeley XML DB
      4. MarkLogic Server
      5. Apache Xindice
      6. Summary
    4. Document-Oriented Databases
      1. IBM Lotus
      2. Apache CouchDB
      3. MongoDB
      4. Riak
    5. Graph Databases
      1. FlockDB
      2. Neo4J
    6. Key-Value Stores and Distributed Hashtables
      1. Amazon Dynamo
      2. Project Voldemort
      3. Redis
    7. Columnar Databases
      1. Google Bigtable
      2. HBase
      3. Hypertable
      4. Polyglot Persistence
    8. Summary
  20. Glossary
  21. Index
  22. About the Author
  23. Colophon
  24. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  25. Copyright
O'Reilly logo

Chapter 8. Clients

We’re used to connecting to relational databases using drivers. For example, in Java, JDBC is an API that abstracts the vendor implementation of the relational database to present a consistent way of storing and retrieving data using Statements, PreparedStatements, ResultSets, and so forth. To interact with the database you get a driver that works with the particular database you’re using, such as Oracle, SQL Server, or MySQL; the implementation details of this interaction are hidden from the developer. Given the right driver, you can use a wide variety of programming languages to connect to a wide variety of databases.

Cassandra is somewhat different in that there are no drivers for it. If you’ve decided to use Python to interact with Cassandra, you don’t go out and find a Cassandra driver for Python; there is no such thing. Instead of just abstracting the database interactions from the developer’s point of view, the way JDBC does, an entirely different mechanism is used. This is a client generation layer, provided by the Thrift API and the Avro project. But there are also high-level Cassandra clients for Java, Scala, Ruby, C#, Python, Perl, PHP, C++, and other languages, written as conveniences by third-party developers.

There are benefits to these clients, in that you can easily embed them in your own applications (which we’ll see how to do) and that they frequently offer more features than the basic Thrift interface does, including connection pooling and JMX ...

The best content for your career. Discover unlimited learning on demand for around $1/day.