You are previewing Cassandra Data Modeling and Analysis.
O'Reilly logo
Cassandra Data Modeling and Analysis

Book Description

Design, build, and analyze your data intricately using Cassandra

In Detail

Starting with a quick introduction to Cassandra, this book flows through various aspects such as fundamental data modeling approaches, selection of data types, designing a data model, choosing suitable keys and indexes through to a real-world application, all the while applying the best practices covered in this book.

Although the application is small, you will be involved in the full development life cycle. You will go through the design considerations of coming up with a flexible and sustainable data model for a stock market technical-analysis application written in Python. As business changes continually and so does a data model, you will also learn the techniques of evolving a data model to address new business requirements. Running a web-scale Cassandra cluster requires many careful considerations such as evolving a data model, performance tuning, and system monitoring. This book is an invaluable tutorial for anyone who wants to adopt Cassandra.

What You Will Learn

  • Discover the unique way of query-driven data modeling in Cassandra
  • Explore the differences between a data model of a relational database and that of Cassandra
  • Master the correct uses of the primary index, composite key, compound key, and secondary index
  • Design a high-performance Cassandra data model
  • Develop a complete, real-world technical-analysis application for the stock market
  • Grasp the techniques of evolving a data model in production
  • Determine effective performance tuning, replication, and system-monitoring strategies
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Cassandra Data Modeling and Analysis
      1. Table of Contents
      2. Cassandra Data Modeling and Analysis
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Bird's Eye View of Cassandra
        1. What is NoSQL?
          1. NoSQL Database types
            1. Key/value pair store
            2. Column-family store
            3. Document-based repository
            4. Graph database
        2. What is Cassandra?
          1. Google BigTable
          2. Amazon Dynamo
        3. Cassandra's high-level architecture
          1. Partitioning
          2. Replication
          3. Snitch
          4. Seed node
          5. Gossip and Failure detection
          6. Write path
          7. Read path
          8. Repair mechanism
        4. Features of Cassandra
        5. Summary
      9. 2. Cassandra Data Modeling
        1. What is unique to the Cassandra data model?
          1. Map and SortedMap
          2. Logical data structure
            1. Column
            2. Row
            3. Column family
            4. Keyspace
            5. Super column and super column family
          3. Collections
          4. No foreign key
          5. No join
          6. No sequence
          7. Counter
          8. Time-To-Live
          9. Secondary index
        2. Modeling by query
          1. Relational version
          2. Cassandra version
        3. Data modeling considerations
          1. Data duplication
          2. Sorting
          3. Wide row
          4. Bucketing
          5. Valueless column
          6. Time-series data
        4. Cassandra Query Language
        5. Summary
      10. 3. CQL Data Types
        1. Introduction to CQL
          1. CQL statements
          2. CQL command-line client – cqlsh
          3. Native data types
          4. Cassandra implementation
          5. A not-so-long example
          6. ASCII
          7. Bigint
          8. BLOB
          9. Boolean
          10. Decimal
          11. Double
          12. Float
          13. Inet
          14. Int
          15. Text
          16. Timestamp
          17. Timeuuid
          18. UUID
          19. Varchar
          20. Varint
          21. Counter
        2. Collections
          1. Set
          2. List
          3. Map
        3. User-defined type and tuple type
        4. Summary
      11. 4. Indexes
        1. Primary index
        2. Compound primary key and composite partition key
          1. Time-series data
        3. Partitioner
          1. Murmur3Partitioner
          2. RandomPartitioner
          3. ByteOrderedPartitioner
          4. Paging and token function
        4. Secondary indexes
          1. Multiple secondary indexes
          2. Secondary index do's and don'ts
        5. Summary
      12. 5. First-cut Design and Implementation
        1. Stock Screener Application
          1. An introduction to financial analysis
          2. Stock quote data
          3. Initial data model
          4. Processing flow
        2. System design
          1. The operating system
          2. Java Runtime Environment
          3. Java Native Access
          4. Cassandra version
          5. Programming language
          6. Cassandra driver
          7. The integrated development environment
          8. The system overview
        3. Code design and development
          1. Data Feed Provider
            1. Collecting stock quote
            2. Transforming data
            3. Storing data in Cassandra
            4. Putting them all together
          2. Stock Screener
            1. Data Scoper
            2. Time-series data
            3. The screening rule
            4. The Stock Screener engine
        4. Test run
        5. Summary
      13. 6. Enhancing a Version
        1. Evolving the data model
          1. The enhancement approach
            1. Watch List
            2. Alert List
            3. Adding the descriptive stock name
            4. Queries on alerts
        2. Enhancing the code
          1. Data Mapper and Archiver
            1. Stock Screener Engine
            2. Queries on Alerts
        3. Implementing system changes
        4. Summary
      14. 7. Deployment and Monitoring
        1. Replication strategies
          1. Data replication
          2. SimpleStrategy
          3. NetworkTopologyStrategy
          4. Setting up the cluster for Stock Screener Application
            1. System and network configuration
            2. Global settings
            3. Configuration procedure
            4. Legacy data migration procedure
            5. Deploying the Stock Screener Application
        2. Monitoring
          1. Nodetool
          2. JMX and MBeans
          3. The system log
        3. Performance tuning
          1. Java virtual machine
          2. Caching
            1. Partition key cache
            2. Row cache
            3. Monitoring cache
            4. Enabling/disabling cache
        4. Summary
      15. 8. Final Thoughts
        1. Supplementary information
          1. Client drivers
          2. Security
            1. Authentication
            2. Authorization
            3. Inter-node encryption
          3. Backup and restore
        2. Useful websites
          1. Apache Cassandra official site
          2. PlanetCassandra
          3. DataStax
          4. Hadoop integration
        3. Summary
      16. Index