You are previewing CouchDB: The Definitive Guide.

CouchDB: The Definitive Guide

Cover of CouchDB: The Definitive Guide by J. Chris Anderson... Published by O'Reilly Media, Inc.
  1. CouchDB: The Definitive Guide
  2. Dedication
  3. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  4. Foreword
  5. Preface
    1. Using Code Examples
    2. Conventions Used in This Book
    3. Safari® Books Online
    4. How to Contact Us
    5. Acknowledgments
      1. J. Chris
      2. Jan
      3. Noah
  6. I. Introduction
    1. 1. Why CouchDB?
      1. Relax
      2. A Different Way to Model Your Data
      3. A Better Fit for Common Applications
      4. Building Blocks for Larger Systems
      5. Local Data Is King
      6. Wrapping Up
    2. 2. Eventual Consistency
      1. Working with the Grain
      2. The CAP Theorem
      3. Local Consistency
      4. Distributed Consistency
      5. Wrapping Up
    3. 3. Getting Started
      1. All Systems Are Go!
      2. Welcome to Futon
      3. Your First Database and Document
      4. Running a Query Using MapReduce
      5. Triggering Replication
      6. Wrapping Up
    4. 4. The Core API
      1. Server
      2. Databases
      3. Documents
      4. Replication
      5. Wrapping Up
  7. II. Developing with CouchDB
    1. 5. Design Documents
      1. Document Modeling
      2. The Query Server
      3. Applications Are Documents
      4. A Basic Design Document
      5. Looking to the Future
    2. 6. Finding Your Data with Views
      1. What Is a View?
      2. Efficient Lookups
      3. The View to Get Comments for Posts
      4. Reduce/Rereduce
      5. Wrapping Up
    3. 7. Validation Functions
      1. Document Validation Functions
      2. Validation’s Context
      3. Writing One
      4. Wrapping Up
    4. 8. Show Functions
      1. The Show Function API
      2. Side Effect–Free
      3. Design Documents
      4. Querying Show Functions
      5. Etags
      6. Functions and Templates
      7. Learning Shows
      8. Using Templates
      9. Writing Templates
    5. 9. Transforming Views with List Functions
      1. Arguments to the List Function
      2. An Example List Function
      3. List Theory
      4. Querying Lists
      5. Lists, Etags, and Caching
  8. III. Example Application
    1. 10. Standalone Applications
      1. Use the Correct Version
      2. Portable JavaScript
      3. Applications Are Documents
      4. Standalone
      5. In the Wild
      6. Wrapping Up
    2. 11. Managing Design Documents
      1. Working with the Example Application
      2. Installing CouchApp
      3. Using CouchApp
      4. Download the Sofa Source Code
      5. Deploying Sofa
      6. Set Up Your Admin Account
      7. Configuring CouchApp with .couchapprc
    3. 12. Storing Documents
      1. JSON Document Format
      2. Beyond _id and _rev: Your Document Data
      3. The Edit Page
      4. Saving a Document
      5. Wrapping Up
    4. 13. Showing Documents in Custom Formats
      1. Rendering Documents with Show Functions
      2. Dynamic Dates
    5. 14. Viewing Lists of Blog Posts
      1. Map of Recent Blog Posts
      2. Rendering the View as HTML Using a List Function
  9. IV. Deploying CouchDB
    1. 15. Scaling Basics
      1. Scaling Read Requests
      2. Scaling Write Requests
      3. Scaling Data
      4. Basics First
    2. 16. Replication
      1. The Magic
      2. Simple Replication with the Admin Interface
      3. Replication in Detail
      4. Continuous Replication
      5. That’s It?
    3. 17. Conflict Management
      1. The Split Brain
      2. Conflict Resolution by Example
      3. Working with Conflicts
      4. Deterministic Revision IDs
      5. Wrapping Up
    4. 18. Load Balancing
      1. Having a Backup
    5. 19. Clustering
      1. Introducing CouchDB Lounge
      2. Consistent Hashing
      3. Growing the Cluster
  10. V. Reference
    1. 20. Change Notifications
      1. Polling for Changes
      2. Long Polling
      3. Continuous Changes
      4. Filters
      5. Wrapping Up
    2. 21. View Cookbook for SQL Jockeys
      1. Using Views
      2. Look Up by Key
      3. Look Up by Prefix
      4. Aggregate Functions
      5. Get Unique Values
      6. Enforcing Uniqueness
    3. 22. Security
      1. The Admin Party
      2. Basic Authentication
      3. Cookie Authentication
      4. Network Server Security
    4. 23. High Performance
      1. Good Benchmarks Are Non-Trivial
      2. High Performance CouchDB
      3. Bulk Inserts and Mostly Monotonic DocIDs
      4. Bulk Document Inserts
      5. Batch Mode
      6. Single Document Inserts
      7. Hovercraft
      8. Trade-Offs
    5. 24. Recipes
      1. Banking
      2. Ordering Lists
      3. Pagination
  11. VI. Appendixes
    1. A. Installing on Unix-like Systems
      1. Debian GNU/Linux
      2. Ubuntu
      3. Gentoo Linux
      4. Problems
    2. B. Installing on Mac OS X
      1. CouchDBX
      2. Homebrew
      3. MacPorts
    3. C. Installing on Windows
    4. D. Installing from Source
      1. Dependencies
      2. Installing
      3. Security Considerations
      4. Running Manually
      5. Running As a Daemon
      6. Troubleshooting
    5. E. JSON Primer
      1. Data Types
    6. F. The Power of B-trees
  12. Index
  13. About the Authors
  14. Colophon
  15. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  16. Copyright

Chapter 9. Transforming Views with List Functions

Just as show functions convert documents to arbitrary output formats, CouchDB list functions allow you to render the output of view queries in any format. The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON.

List functions are stored under the lists field of a design document. Here’s an example design document that contains two list functions:

  "_id" : "_design/foo",
  "_rev" : "1-67at7bg",
  "lists" : {
    "bar" : "function(head, req) { var row; while (row = getRow()) { ... } }",
    "zoom" : "function() { return 'zoom!' }",

Arguments to the List Function

The function is called with two arguments, which can sometimes be ignored, as the row data itself is loaded during function execution. The first argument, head, contains information about the view. Here’s what you might see looking at a JSON representation of head:

{total_rows:10, offset:0}

The request itself is a much richer data structure. This is the same request object that is available to show, update, and filter functions. We’ll go through it in detail here as a reference. Here’s the example req object:

  "info": {
    "db_name": "test_suite_db","doc_count": 11,"doc_del_count": 0,
    "update_seq": 11,"purge_seq": 0,"compact_running": false,"disk_size": 4930,
    "instance_start_time": "1250046852578425","disk_format_version": 4},

The database information, as available in an information request against a database’s URL, is included in the request parameters. This allows you to stamp rendered rows with an update sequence and know the database you are working with.

  "method": "GET",
  "path": ["test_suite_db","_design","lists","_list","basicJSON","basicView"],

The HTTP method and the path in the client from the client request are useful, especially for rendering links to other resources within the application.

  "query": {"foo":"bar"},

If there are parameters in the query string (in this case corresponding to ?foo=bar), they will be parsed and available as a JSON object at req.query.

    {"Accept": "text/html,application/xhtml+xml ,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7","Accept-Encoding": 
    "gzip,deflate","Accept-Language": "en-us,en;q=0.5","Connection": "keep-alive",
    "Cookie": "_x=95252s.sd25; AuthSession=","Host": "",
    "Keep-Alive": "300",
    "Referer": "",
    "User-Agent": "Mozilla/5.0 Gecko/20090729 Firefox/3.5.2"},
  "cookie": {"_x": "95252s.sd25","AuthSession": ""},

Headers give list and show functions the ability to provide the Content-Type response that the client prefers, as well as other nifty things like cookies. Note that cookies are also parsed into a JSON representation. Thanks, MochiWeb!

  "body": "undefined",
  "form": {},

In the case where the method is POST, the request body (and a form-decoded JSON representation of it, if applicable) are available as well.

  "userCtx": {"db": "test_suite_db","name": null,"roles": ["_admin"]}

Finally, the userCtx is the same as that sent to the validation function. It provides access to the database the user is authenticated against, the user’s name, and the roles they’ve been granted. In the previous example, you see an anonymous user working with a CouchDB node that is in “admin party” mode. Unless an admin is specified, everyone is an admin.

That’s enough about the arguments to list functions. Now it’s time to look at the mechanics of the function itself.

An Example List Function

Let’s put this knowledge to use. In the chapter introduction, we mentioned using lists to generate config files. One fun thing about this is that if you keep your configuration information in CouchDB and generate it with lists, you don’t have to worry about being able to regenerate it again, because you know the config will be generated by a pure function from your database and not other sources of information. This level of isolation will ensure that your config files can be generated correctly as long as CouchDB is running. Because you can’t fetch data from other system services, files, or network sources, you can’t accidentally write a config file generator that fails due to external factors.


J. Chris got excited about the idea of using list functions to generate config files for the sort of services people usually configure using CouchDB, specifically via Chef, an Apache-licensed infrastructure automation tool. The key feature of infrastructure automation is that deployment scripts are idempotent—that is, running your scripts multiple times will have the same intended effect as running them once, something that becomes critical when a script fails halfway through. This encourages crash-only design, where your scripts can bomb out multiple times but your data remains consistent, because it takes the guesswork out of provisioning and updating servers in the case of previous failures.

Like map, reduce, and show functions, lists are pure functions, from a view query and an HTTP request to an output format. They can’t make queries against remote services or otherwise access outside data, so you know they are repeatable. Using a list function to generate an HTTP server configuration file ensures that the configuration is generated repeatably, based on only the state of the database.

Imagine you are running a shared hosting platform, with one name-based virtual host per user. You’ll need a config file that starts out with some node configuration (which modules to use, etc.) and is followed by one config section per user, setting things like the user’s HTTP directory, subdomain, forwarded ports, etc.

function(head, req) {
  // helper function definitions would be here...
  var row, userConf, configHeader, configFoot;
  configHeader = renderTopOfApacheConf(head, req.query.hostname);

In the first block of the function, we’re rendering the top of the config file using the function renderTopOfApacheConf(head, req.query.hostname). This may include information that’s posted into the function, like the internal name of the server that is being configured or the root directory in which user HTML files are organized. We won’t show the function body, but you can imagine that it would return a long multi-line string that handles all the global configuration for your server and sets the stage for the per-user configuration that will be based on view data.

The call to send(configHeader) is the heart of your ability to render text using list functions. Put simply, it just sends an HTTP chunk to the client, with the content of the strings pasted to it. There is some batching behind the scenes, as CouchDB speaks with the JavaScript runner with a synchronous protocol, but from the perspective of a programmer, send() is how HTTP chunks are born.

Now that we’ve rendered and sent the file’s head, it’s time to start rendering the list itself. Each list item will be the result of converting a view row to a virtual host’s configuration element. The first thing we do is call getRow() to get a row of the view.

  while (row = getRow()) {
    var userConf = renderUserConf(row);

The while loop used here will continue to run until getRow() returns null, which is how CouchDB signals to the list function that all valid rows (based on the view query parameters) have been exhausted. Before we get ahead of ourselves, let’s check out what happens when we do get a row.

In this case, we simply render a string based on the row and send it to the client. Once all rows have been rendered, the loop is complete. Now is a good time to note that the function has the option to return early. Perhaps it is programmed to stop iterating when it sees a particular user’s document or is based on a tally it’s been keeping of some resource allocated in the configuration. In those cases, the loop can end early with a break statement or other method. There’s no requirement for the list function to render every row that is sent to it.

  configFoot = renderConfTail();
  return configFoot;

Finally, we close out the configuration file and return the final string value to be sent as the last HTTP chunk. The last action of a list function is always to return a string, which will be sent as the final HTTP chunk to the client.

To use our config file generation function in practice, we might run a command-line script that looks like:

curl http://localhost:5984/config_db/_design/files/_list/apache/users?hostname=foobar 
> apache.conf

This will render our Apache config based on data in the user’s view and save it to a file. What a simple way to build a reliable configuration generator!

List Theory

Now that we’ve seen a complete list function, it’s worth mentioning some of the helpful properties they have.

The most obvious thing is the iterator-style API. Because each row is loaded independently by calling getRow(), it’s easy not to leak memory. The list function API is capable of rendering lists of arbitrary length without error, when used correctly.

On the other hand, this API gives you the flexibility to bundle a few rows in a single chunk of output, so if you had a view of, say, user accounts, followed by subdomains owned by that account, you could use a slightly more complex loop to build up some state in the list function for rendering more complex chunks. Let’s look at an alternate loop section:

var subdomainOwnerRow, subdomainRows = [];
while (row = getRow()) {

We’ve entered a loop that will continue until we have reached the endkey of the view. The view is structured so that a user profile row is emitted, followed by all of that user’s subdomains. We’ll use the profile data and the subdomain information to template the configuration for each individual user. This means we can’t render any subdomain configuration until we know we’ve received all the rows for the current user.

  if (!subdomainOwnerRow) {
    subdomainOwnerRow = row;

This case is true only for the first user. We’re merely setting up the initial conditions.

  } else if (row.value.user != subdomainOwnerRow.value.user) {

This is the end case. It will be called only after all the subdomain rows for the current user have been exhausted. It is triggered by a row with a mismatched user, indicating that we have all the subdomain rows.

    send(renderUserConf(subdomainOwnerRow, subdomainRows));

We know we are ready to render everything for the current user, so we pass the profile row and the subdomain rows to a render function (which nicely hides all the gnarly nginx config details from our fair reader). The result is sent to the HTTP client, which writes it to the config file.

    subdomainRows = [];
    subdomainOwnerRow = row;

We’ve finished with that user, so let’s clear the rows and start working on the next user.

  } else {

Ahh, back to work, collecting rows.

send(renderUserConf(subdomainOwnerRow, subdomainRows));

This last bit is tricky—after the loop is finished (we’ve reached the end of the view query), we’ve still got to render the last user’s config. Wouldn’t want to forget that!

The gist of this loop section is that we collect rows that belong to a particular user until we see a row that belongs to another user, at which point we render output for the first user, clear our state, and start working with the new user. Techniques like this show how much flexibility is allowed by the list iterator API.

More uses along these lines include filtering rows that should be hidden from a particular result set, finding the top N grouped reduce values (e.g., to sort a tag cloud by popularity), and even writing custom reduce functions (as long as you don’t mind that reductions are not stored incrementally).

Querying Lists

We haven’t looked in detail at the ways list functions are queried. Just like show functions, they are resources available on the design document. The basic path to a list function is as follows:


Because the list name and the view name are both specified, this means it is possible to render a list against more than one view. For instance, you could have a list function that renders blog comments in the Atom XML format, and then run it against both a global view of recent comments as well as a view of recent comments by blog post. This would allow you to use the same list function to provide an Atom feed for comments across an entire site, as well as individual comment feeds for each post.

After the path to the list comes the view query parameter. Just like a regular view, calling a list function without any query parameters results in a list that reflects every row in the view. Most of the time you’ll want to call it with query parameters to limit the returned data.

You’re already familiar with the view query options from Chapter 6. The same query options apply to the _list query. Let’s look at URLs side by side; see Example 9-1.

Example 9-1. A JSON view query

GET /db/_design/sofa/_view/recent-posts?descending=true&limit=10

This view query is just asking for the 10 most recent blog posts. Of course, this query could include parameters like startkey or skip—we’re leaving them out for simplicity. To run the same query through a list function, we access it via the list resource, as shown in Example 9-2.

Example 9-2. The HTML list query

GET /db/_design/sofa/_list/index/recent-posts?descending=true&limit=10

The index list here is a function from JSON to HTML. Just like the preceding view query, additional query parameters can be applied to paginate through the list. As we’ll see in Part III, once you have a working list, adding pagination is trivial. See Example 9-3.

Example 9-3. The Atom list query

GET /db/_design/sofa/_list/index/recent-posts?descending=true&limit=10&format=atom

The list function can also look at the query parameters and do things like switch that output to render based on parameters. You can even do things like pass the username into the list using a query parameter (but it’s not recommended, as you’ll ruin cache efficiency).

Lists, Etags, and Caching

Just like show functions and view queries, lists are sent with proper HTTP Etags, which makes them cacheable by intermediate proxies. This means that if your server is starting to bog down in list-rendering code, it should be possible to relieve load by using a caching reverse proxy like Squid. We won’t go into the details of Etags and caching here, as they were covered in Chapter 8.

The best content for your career. Discover unlimited learning on demand for around $1/day.