You are previewing Next Generation Databases: NoSQL, NewSQL, and Big Data.
O'Reilly logo
Next Generation Databases: NoSQL, NewSQL, and Big Data

Book Description

This is a book for enterprise architects, database administrators, and developers who need to understand the latest developments in database technologies. It is the book to help you choose the correct database technology at a time when concepts such as Big Data, NoSQL and NewSQL are making what used to be an easy choice into a complex decision with significant implications.

The relational database (RDBMS) model completely dominated database technology for over 20 years. Today this "one size fits all" stability has been disrupted by a relatively recent explosion of new database technologies. These paradigm-busting technologies are powering the "Big Data" and "NoSQL" revolutions, as well as forcing fundamental changes in databases across the board.

Deciding to use a relational database was once truly a no-brainer, and the various commercial relational databases competed on price, performance, reliability, and ease of use rather than on fundamental architectures. Today we are faced with choices between radically different database technologies. Choosing the right database today is a complex undertaking, with serious economic and technological consequences.

Next Generation Databases demystifies today’s new database technologies. The book describes what each technology was designed to solve. It shows how each technology can be used to solve real word application and business problems. Most importantly, this book highlights the architectural differences between technologies that are the critical factors to consider when choosing a database platform for new and upcoming projects.

  • Introduces the new technologies that have revolutionized the database landscape
  • Describes how each technology can be used to solve specific application or business challenges
  • Reviews the most popular new wave databases and how they use these new database technologies
  • Table of Contents

    1. Cover
    2. Title
    3. Copyright
    4. Dedication
    5. Contents at a Glance
    6. Contents
    7. About the Author
    8. About the Technical Reviewer
    9. Acknowledgments
    10. Part I: Next Generation Databases
      1. Chapter 1: Three Database Revolutions
        1. Early Database Systems
        2. The First Database Revolution
        3. The Second Database Revolution
          1. Relational theory
          2. Transaction Models
          3. The First Relational Databases
          4. Database Wars!
          5. Client-server Computing
          6. Object-oriented Programming and the OODBMS
          7. The Relational Plateau
        4. The Third Database Revolution
          1. Google and Hadoop
          2. The Rest of the Web
          3. Cloud Computing
          4. Document Databases
          5. The “NewSQL”
          6. The Nonrelational Explosion
        5. Conclusion: One Size Doesn’t Fit All
        6. Notes
      2. Chapter 2: Google, Big Data, and Hadoop
        1. The Big Data Revolution
          1. Cloud, Mobile, Social, and Big Data
        2. Google: Pioneer of Big Data
          1. Google Hardware
          2. The Google Software Stack
          3. More about MapReduce
        3. Hadoop: Open-Source Google Stack
          1. Hadoop’s Origins
          2. The Power of Hadoop
          3. Hadoop’s Architecture
          4. HBase
          5. Hive
          6. Pig
          7. The Hadoop Ecosystem
        4. Conclusion
        5. Notes
      3. Chapter 3: Sharding, Amazon, and the Birth of NoSQL
        1. Scaling Web 2.0
          1. How Web 2.0 was Won
          2. The Open-source Solution
          3. Sharding
          4. Death by a Thousand Shards
          5. CAP Theorem
          6. Eventual Consistency
        2. Amazon’s Dynamo
          1. Consistent Hashing
          2. Tunable Consistency
          3. Dynamo and the Key-value Store Family
        3. Conclusion
        4. Note
      4. Chapter 4: Document Databases
        1. XML and XML Databases
          1. XML Tools and Standards
          2. XML Databases
          3. XML Support in Relational Systems
        2. JSON Document Databases
          1. JSON and AJAX
          2. JSON Databases
          3. Data Models in Document Databases
          4. Early JSON Databases
          5. MemBase and CouchBase
          6. MongoDB
          7. JSON, JSON, Everywhere
        3. Conclusion
      5. Chapter 5: Tables are Not Your Friends: Graph Databases
        1. What is a Graph?
        2. RDBMS Patterns for Graphs
        3. RDF and SPARQL
        4. Property Graphs and Neo4j
        5. Gremlin
        6. Graph Database Internals
        7. Graph Compute Engines
        8. Conclusion
      6. Chapter 6: Column Databases
        1. Data Warehousing Schemas
        2. The Columnar Alternative
          1. Columnar Compression
          2. Columnar Write Penalty
        3. Sybase IQ, C-Store, and Vertica
        4. Column Database Architectures
          1. Projections
          2. Columnar Technology in Other Databases
        5. Conclusion
        6. Note
      7. Chapter 7: The End of Disk? SSD and In-Memory Databases
        1. The End of Disk?
          1. Solid State Disk
          2. The Economics of Disk
          3. SSD-Enabled Databases
        2. In-Memory Databases
          1. TimesTen
          2. Redis
          3. SAP HANA
          4. VoltDB
          5. Oracle 12c “in-Memory Database”
        3. Berkeley Analytics Data Stack and Spark
          1. Spark Architecture
        4. Conclusion
        5. Note
    11. Part II: The Gory Details
      1. Chapter 8: Distributed Database Patterns
        1. Distributed Relational Databases
          1. Replication
          2. Shared Nothing and Shared Disk
        2. Nonrelational Distributed Databases
        3. MongoDB Sharding and Replication
          1. Sharding
          2. Sharding Mechanisms
          3. Cluster Balancing
          4. Replication
          5. Write Concern and Read Preference
        4. HBase
          1. Tables, Regions, and RegionServers
          2. Caching and Data Locality
          3. Rowkey Ordering
          4. RegionServer Splits, Balancing, and Failure
          5. Region Replicas
        5. Cassandra
          1. Gossip
          2. Consistent Hashing
          3. Replicas
          4. Snitches
        6. Summary
      2. Chapter 9: Consistency Models
        1. Types of Consistency
          1. ACID and MVCC
          2. Global Transaction Sequence Numbers
          3. Two-phase Commit
          4. Other Levels of Consistency
        2. Consistency in MongoDB
          1. MongoDB Locking
          2. Replica Sets and Eventual Consistency
        3. HBase Consistency
          1. Eventually Consistent Region Replicas
        4. Cassandra Consistency
          1. Replication Factor
          2. Write Consistency
          3. Read Consistency
          4. Interaction between Consistency Levels
          5. Hinted Handoff and Read Repair
          6. Timestamps and Granularity
          7. Vector Clocks
          8. Lightweight Transactions
        5. Conclusion
      3. Chapter 10: Data Models and Storage
        1. Data Models
          1. Review of the Relational Model of Data
          2. Key-value Stores
          3. Data Models in BigTable and HBase
          4. Cassandra
          5. JSON Data Models
        2. Storage
          1. Typical Relational Storage Model
          2. Log-structured Merge Trees
          3. Secondary Indexing
        3. Conclusion
      4. Chapter 11: Languages and Programming Interfaces
        1. SQL
        2. NoSQL APIs
          1. Riak
          2. Hbase
          3. MongoDB
          4. Cassandra Query Language (CQL)
          5. MapReduce
          6. Pig
          7. Directed Acyclic Graphs
          8. Cascading
          9. Spark
        3. The Return of SQL
          1. Hive
          2. Impala
          3. Spark SQL
          4. Couchbase N1QL
          5. Apache Drill
          6. Other SQL on NoSQL
        4. Conclusion
        5. Note
      5. Chapter 12: Databases of the Future
        1. The Revolution Revisited
        2. Counterrevolutionaries
          1. Have We Come Full Circle?
          2. An Embarrassment of Choice
        3. Can We have it All?
          1. Consistency Models
          2. Schema
          3. Database Languages
          4. Storage
          5. A Vision for a Converged Database
        4. Meanwhile, Back at Oracle HQ ...
          1. Oracle JSON Support
          2. Accessing JSON via Oracle REST
          3. REST Access to Oracle Tables
          4. Oracle Graph
          5. Oracle Sharding
          6. Oracle as a Hybrid Database
        5. Other Convergent Databases
        6. Disruptive Database Technologies
          1. Storage Technologies
          2. Blockchain
          3. Quantum Computing
        7. Conclusion
        8. Notes
      6. Appendix A: Database Survey
        1. Aerospike
        2. Cassandra
        3. CouchBase
        4. DynamoDB
        5. HBase
        6. MarkLogic
        7. MongoDB
        8. Neo4J
        9. NuoDB
        10. Oracle RDBMS
        11. Redis
        12. Riak
        13. SAP HANA
        14. TimesTen
        15. Vertica
        16. VoltDB
    12. Index