You are previewing CouchDB: The Definitive Guide.

CouchDB: The Definitive Guide

Cover of CouchDB: The Definitive Guide by J. Chris Anderson... Published by O'Reilly Media, Inc.
  1. CouchDB: The Definitive Guide
  2. Dedication
  3. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  4. Foreword
  5. Preface
    1. Using Code Examples
    2. Conventions Used in This Book
    3. Safari® Books Online
    4. How to Contact Us
    5. Acknowledgments
      1. J. Chris
      2. Jan
      3. Noah
  6. I. Introduction
    1. 1. Why CouchDB?
      1. Relax
      2. A Different Way to Model Your Data
      3. A Better Fit for Common Applications
      4. Building Blocks for Larger Systems
      5. Local Data Is King
      6. Wrapping Up
    2. 2. Eventual Consistency
      1. Working with the Grain
      2. The CAP Theorem
      3. Local Consistency
      4. Distributed Consistency
      5. Wrapping Up
    3. 3. Getting Started
      1. All Systems Are Go!
      2. Welcome to Futon
      3. Your First Database and Document
      4. Running a Query Using MapReduce
      5. Triggering Replication
      6. Wrapping Up
    4. 4. The Core API
      1. Server
      2. Databases
      3. Documents
      4. Replication
      5. Wrapping Up
  7. II. Developing with CouchDB
    1. 5. Design Documents
      1. Document Modeling
      2. The Query Server
      3. Applications Are Documents
      4. A Basic Design Document
      5. Looking to the Future
    2. 6. Finding Your Data with Views
      1. What Is a View?
      2. Efficient Lookups
      3. The View to Get Comments for Posts
      4. Reduce/Rereduce
      5. Wrapping Up
    3. 7. Validation Functions
      1. Document Validation Functions
      2. Validation’s Context
      3. Writing One
      4. Wrapping Up
    4. 8. Show Functions
      1. The Show Function API
      2. Side Effect–Free
      3. Design Documents
      4. Querying Show Functions
      5. Etags
      6. Functions and Templates
      7. Learning Shows
      8. Using Templates
      9. Writing Templates
    5. 9. Transforming Views with List Functions
      1. Arguments to the List Function
      2. An Example List Function
      3. List Theory
      4. Querying Lists
      5. Lists, Etags, and Caching
  8. III. Example Application
    1. 10. Standalone Applications
      1. Use the Correct Version
      2. Portable JavaScript
      3. Applications Are Documents
      4. Standalone
      5. In the Wild
      6. Wrapping Up
    2. 11. Managing Design Documents
      1. Working with the Example Application
      2. Installing CouchApp
      3. Using CouchApp
      4. Download the Sofa Source Code
      5. Deploying Sofa
      6. Set Up Your Admin Account
      7. Configuring CouchApp with .couchapprc
    3. 12. Storing Documents
      1. JSON Document Format
      2. Beyond _id and _rev: Your Document Data
      3. The Edit Page
      4. Saving a Document
      5. Wrapping Up
    4. 13. Showing Documents in Custom Formats
      1. Rendering Documents with Show Functions
      2. Dynamic Dates
    5. 14. Viewing Lists of Blog Posts
      1. Map of Recent Blog Posts
      2. Rendering the View as HTML Using a List Function
  9. IV. Deploying CouchDB
    1. 15. Scaling Basics
      1. Scaling Read Requests
      2. Scaling Write Requests
      3. Scaling Data
      4. Basics First
    2. 16. Replication
      1. The Magic
      2. Simple Replication with the Admin Interface
      3. Replication in Detail
      4. Continuous Replication
      5. That’s It?
    3. 17. Conflict Management
      1. The Split Brain
      2. Conflict Resolution by Example
      3. Working with Conflicts
      4. Deterministic Revision IDs
      5. Wrapping Up
    4. 18. Load Balancing
      1. Having a Backup
    5. 19. Clustering
      1. Introducing CouchDB Lounge
      2. Consistent Hashing
      3. Growing the Cluster
  10. V. Reference
    1. 20. Change Notifications
      1. Polling for Changes
      2. Long Polling
      3. Continuous Changes
      4. Filters
      5. Wrapping Up
    2. 21. View Cookbook for SQL Jockeys
      1. Using Views
      2. Look Up by Key
      3. Look Up by Prefix
      4. Aggregate Functions
      5. Get Unique Values
      6. Enforcing Uniqueness
    3. 22. Security
      1. The Admin Party
      2. Basic Authentication
      3. Cookie Authentication
      4. Network Server Security
    4. 23. High Performance
      1. Good Benchmarks Are Non-Trivial
      2. High Performance CouchDB
      3. Bulk Inserts and Mostly Monotonic DocIDs
      4. Bulk Document Inserts
      5. Batch Mode
      6. Single Document Inserts
      7. Hovercraft
      8. Trade-Offs
    5. 24. Recipes
      1. Banking
      2. Ordering Lists
      3. Pagination
  11. VI. Appendixes
    1. A. Installing on Unix-like Systems
      1. Debian GNU/Linux
      2. Ubuntu
      3. Gentoo Linux
      4. Problems
    2. B. Installing on Mac OS X
      1. CouchDBX
      2. Homebrew
      3. MacPorts
    3. C. Installing on Windows
    4. D. Installing from Source
      1. Dependencies
      2. Installing
      3. Security Considerations
      4. Running Manually
      5. Running As a Daemon
      6. Troubleshooting
    5. E. JSON Primer
      1. Data Types
    6. F. The Power of B-trees
  12. Index
  13. About the Authors
  14. Colophon
  15. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  16. Copyright
O'Reilly logo

Appendix F. The Power of B-trees

CouchDB uses a data structure called a B-tree to index its documents and views. We’ll look at B-trees enough to understand the types of queries they support and how they are a good fit for CouchDB.

This is our first foray into CouchDB internals. To use CouchDB, you don’t need to know what’s going on under the hood, but if you understand how CouchDB performs its magic, you’ll be able to pull tricks of your own. Additionally, if you understand the consequences of the ways you are using CouchDB, you will end up with smarter systems.

If you weren’t looking closely, CouchDB would appear to be a B-tree manager with an HTTP interface.


CouchDB is actually using a B+ tree, which is a slight variation of the B-tree that trades a bit of (disk) space for speed. When we say B-tree, we mean CouchDB’s B+ tree.

A B-tree is an excellent data structure for storing huge amounts of data for fast retrieval. When there are millions and billions of items in a B-tree, that’s when they get fun. B-trees are usually a shallow but wide data structure. While other trees can grow very high, a typical B-tree has a single-digit height, even with millions of entries. This is particularly interesting for CouchDB, where the leaves of the tree are stored on a slow medium such as a hard drive. Accessing any part of the tree for reading or writing requires visiting only a few nodes, which translates to a few head seeks (which are what make a hard drive slow), and because the operating system is likely to cache the upper tree nodes anyway, only the seek to the final leaf node is needed.

From a practical point of view, B-trees, therefore, guarantee an access time of less than 10 ms even for extremely large datasets.

Dr. Rudolf Bayer, inventor of the B-tree

CouchDB’s B-tree implementation is a bit different from the original. While it maintains all of the important properties, it adds Multi-Version Concurrency Control (MVCC) and an append-only design. B-trees are used to store the main database file as well as view indexes. One database is one B-tree, and one view index is one B-tree.

MVCC allows concurrent reads and writes without using a locking system. Writes are serialized, allowing only one write operation at any point in time for any single database. Write operations do not block reads, and there can be any number of read operations at any time. Each read operation is guaranteed a consistent view of the database. How this is accomplished is at the core of CouchDB’s storage model.

The short answer is that because CouchDB uses append-only files, the B-tree root node must be rewritten every time the file is updated. However, old portions of the file will never change, so every old B-tree root, should you happen to have a pointer to it, will also point to a consistent snapshot of the database.

Early in the book we explained how the MVCC system uses the document’s _rev value to ensure that only one person can change a document version. The B-tree is used to look up the existing _rev value for comparison. By the time a write is accepted, the B-tree can expect it to be an authoritative version.

Since old versions of documents are not overwritten or deleted when new versions come in, requests that are reading a particular version do not care if new ones are written at the same time. With an often changing document, there could be readers reading three different versions at the same time. Each version was the latest one when a particular client started reading it, but new versions were being written. From the point when a new version is committed, new readers will read the new version while old readers keep reading the old version.

In a B-tree, data is kept only in leaf nodes. CouchDB B-trees append data only to the database file that keeps the B-tree on disk and grows only at the end. Add a new document? The file grows at the end. Delete a document? That gets recorded at the end of the file. The consequence is a robust database file. Computers fail for plenty of reasons, such as power loss or failing hardware. Since CouchDB does not overwrite any existing data, it cannot corrupt anything that has been written and committed to disk already. See Figure F-1.

Committing is the process of updating the database file to reflect changes. This is done in the file footer, which is the last 4k of the database file. The footer is 2k in size and written twice in succession. First, CouchDB appends any changes to the file and then records the file’s new length in the first database footer. It then force-flushes all changes to disk. It then copies the first footer over to the second 2k of the file and force-flushes again.

Flat B-tree and append-only

Figure F-1. Flat B-tree and append-only

If anywhere in this process a problem occurs—say, power is cut off and CouchDB is restarted later—the database file is in a consistent state and doesn’t need a checkup. CouchDB starts reading the database file backward. When it finds a footer pair, it makes some checks: if the first 2k are corrupt (a footer includes a checksum), CouchDB replaces it with the second footer and all is well. If the second footer is corrupt, CouchDB copies the first 2k over and all is well again. Only once both footers are flushed to disk successfully will CouchDB acknowledge that a write operation was successful. Data is never lost, and data on disk is never corrupted. This design is the reason for CouchDB having no off switch. You just terminate it when you are done.

There’s a lot more to say about B-trees in general, and if and how SSDs change the runtime behavior. The Wikipedia article on B-trees is a good starting point for further investigations. Scholarpedia includes notes by Dr. Rudolf Bayer, inventor of the B-tree.

The best content for your career. Discover unlimited learning on demand for around $1/day.