Beginning Apache Cassandra Development

Book description

Beginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON document model. It is specifically designed to manage large amounts of data across many commodity servers without there being any single point of failure. This design approach makes Apache Cassandra a robust and easy-to-implement platform when high availability is needed.

Apache Cassandra can be used by developers in Java, PHP, Python, and JavaScript—the primary and most commonly used languages. In Beginning Apache Cassandra Development, author and Cassandra expert Vivek Mishra takes you through using Apache Cassandra from each of these primary languages. Mishra also covers the Cassandra Query Language (CQL), the Apache Cassandra analog to SQL. You'll learn to develop applications sourcing data from Cassandra, query that data, and deliver it at speed to your application's users.

Cassandra is one of the leading NoSQL databases, meaning you get unparalleled throughput and performance without the sort of processing overhead that comes with traditional proprietary databases. Beginning Apache Cassandra Development will therefore help you create applications that generate search results quickly, stand up to high levels of demand, scale as your user base grows, ensure operational simplicity, and—not least—provide delightful user experiences.

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. Dedication
  5. Contents at a Glance
  6. Contents
  7. About the Author
  8. About the Technical Reviewer
  9. Acknowledgments
  10. Introduction
  11. Chapter 1: NoSQL: Cassandra Basics
    1. Introducing NoSQL
      1. NoSQL Ecosystem
      2. CAP Theorem
      3. Budding Schema
      4. Scalability
    2. Identifying the Big Data Problem
    3. Introducing Cassandra
      1. Distributed Databases
      2. Peer-to-Peer Design
      3. Configurable Data Consistency
      4. Cassandra Query Language (CQL)
    4. Installing Cassandra
    5. Logging in Cassandra
      1. Application Logging Options
      2. Changing Log Properties
      3. Managing Logs via JConsole
      4. Commit Log Archival
    6. Configuring Replication and Data Center
      1. LocalStrategy
      2. NetworkTopologyStrategy
      3. SimpleStrategy
      4. Cassandra Multiple Node Configuration
    7. Summary
  12. Chapter 2: Cassandra Data Modeling
    1. Introducing Data Modeling
    2. Data Types
    3. Dynamic Columns
      1. Dynamic Columns via Thrift
      2. Dynamic Columns via cqlsh Using Map Support
      3. Dynamic Columns via cqlsh Using Set Support
    4. Secondary Indexes
    5. CQL3 and Thrift Interoperability
    6. Changing Data Types
      1. Thrift Way
      2. CQL3 Way
    7. Counter Column
      1. Counter Column with and without replicate_on_write
      2. Play with Counter Columns
    8. Data Modeling Tips
    9. Summary
  13. Chapter 3: Indexes and Composite Columns
    1. Indexes
      1. Clustered Indexes vs. Non-Clustered Indexes
      2. Index Distribution
      3. Indexing in Cassandra
      4. Secondary Indexes
    2. Composite Columns
      1. Allow Filtering
      2. Expiring Columns
      3. Default TTL
      4. Data Partitioning
    3. What’s New in Cassandra 2.0
      1. Compare and Set
      2. Secondary Index over Composite Columns
      3. Conditional DDL
    4. Summary
  14. Chapter 4: Cassandra Data Security
    1. Authentication and Authorization
      1. system and system_auth Keyspaces
      2. Managing User Permissions
      3. Accessing system_auth with AllowAllAuthorizer
    2. Preparing Server Certificates
    3. Connecting with SSL Encryption
      1. Connecting via Cassandra-cli
      2. Connecting via cqlsh
      3. Connecting via the Cassandra Thrift Client
    4. Summary
  15. Chapter 5: MapReduce with Cassandra
    1. Batch Processing and MapReduce
    2. Apache Hadoop
      1. HDFS
      2. MapReduce
      3. Read and Store Tweets into HDFS
    3. Cassandra MapReduce Integration
      1. Reading Tweets from HDFS and Storing Count Results into Cassandra
      2. Cassandra In and Cassandra Out
    4. Stream or Real-Time Analytics
    5. Summary
  16. Chapter 6: Data Migration and Analytics
    1. Data Migration and Analytics
    2. Apache Pig
      1. Setup and Installation
      2. Understanding Pig
      3. Counting Tweets
      4. Pig with Cassandra
    3. Apache Hive
      1. Setup and Configuration
      2. Understanding UDF, UDAF, and UDTF
      3. Hive Tables
      4. Local FS Data Loading
      5. HDFS Data Loading
      6. Hive External Table
      7. Hive with Cassandra
    4. Data Migration
      1. In the Traditional Way
      2. Apache Sqoop
      3. Sqoop with Cassandra
    5. Summary
  17. Chapter 7: Titan Graph Databases with Cassandra
    1. Introduction to Graphs
      1. Simple and Nonsimple Graphs
      2. Directed and Undirected Graphs
      3. Cyclic and Acyclic Graphs
    2. Open Source Software for Graphs
      1. Graph Frameworks: TinkerPop
      2. Graph as a Database
    3. Titan Graph Databases
      1. Basic Concepts
      2. Setup and Installation
      3. Command-line Tools and Clients
    4. Titan with Cassandra
      1. Titan Java API
      2. Cassandra for Backend Storage
      3. Use Cases
    5. Summary
  18. Chapter 8: Cassandra Performance Tuning
    1. Understanding the Key Performance Indicators
      1. CPU and Memory Utilization
      2. Heavy Read/Write Throughput and Latency
      3. Logical and Physical Reads
    2. Cassandra Configuration
      1. Data Caches
      2. Bloom Filters
      3. Off-Heap vs. On-Heap
    3. Cassandra Stress Testing
      1. Write Mode
      2. Read Mode
      3. Monitoring
      4. Compaction Strategy
    4. Yahoo Cloud Serving Benchmarking
    5. Summary
  19. Chapter 9: Cassandra: Administration and Monitoring
    1. Adding Nodes to Cassandra Cluster
    2. Replacing a Dead Node
    3. Data Backup and Restoration
      1. Using nodetool snapshot and sstableloader
      2. Using nodetool refresh
      3. Using clearsnapshot
    4. Cassandra Monitoring Tools
      1. Helenos
      2. DataStax DevCenter and OpsCenter
    5. Summary
  20. Chapter 10: Cassandra Utilities
    1. Cassandra nodetool Utility
      1. Ring Management
      2. Schema Management
    2. JSONifying Data
      1. Exporting Data to JSON Files with sstable2json
      2. Importing JSON Data with json2sstable
    3. Cassandra Bulk Loading
    4. Summary
  21. Chapter 11: Upgrading Cassandra and Troubleshooting
    1. Cassandra 2.1
      1. User-Defined Types
      2. Frozen Types
      3. Indexing on Collection Attributes
    2. Upgrading Cassandra Versions
      1. Backward Compatibility
      2. Performing an Upgrade with a Rolling Restart
    3. Troubleshooting Cassandra
      1. Too Many Open Files
      2. Stack Size Limit
      3. Out of Memory Errors
      4. Too Much Garbage Collection Activity
    4. Road Ahead with Cassandra
    5. Summary
    6. References
  22. Index

Product information

  • Title: Beginning Apache Cassandra Development
  • Author(s):
  • Release date: December 2014
  • Publisher(s): Apress
  • ISBN: 9781484201428