Chapter 4. Getting Data In and Out

With Couchbase Server installed, and your chosen client library up and running, it is time to start storing and retrieving information into the database. All of the client libraries use the same basic operations, although they may have different function or methods names. Despite the differences that exist between individual client libraries, the basic operations and how and why you use them are the same.

A basic list of the key Create, Retrieve, Update, and Delete (CRUD) operations is shown in Table 4-1.

Table 4-1. Key CRUD operations
OperationDescription
add(id, document [, expiry])Add an item if ID doesn’t already exist
set(id, document [, expiry])Store a document with ID
replace(id, document [, expiry])Update the document for an existing ID
cas(id, document, check [, expiry])Update the document for an existing ID providing the check matches
get(id)Get the specified document
incr(id [, offset])Increment the document value by offset
decr(id [, offset]Decrement the document value by offset
append(id, value)Append the content to the end of the current document
prepend(id, value)Prepend the content to the beginning of the current document
delete(id)Delete the specified document

There are, though, common use cases and sequences with the different operations that can be used to achieve different processes and solutions. Let’s look at some of these in the context of a typical web application.

Basic Interface

To start, let’s look at a basic Ruby script that stores some data, and retrieves it, and handles the basic responses. A sample program, hello-world.rb, is shown in Example 4-1.

Example 4-1. Hello World!
require 'rubygems'
require 'couchbase'

client = Couchbase.new "http://127.0.0.1:8091/pools/default"
client.quiet = false
begin
  spoon = client.get "spoon"
  puts spoon
rescue Couchbase::Error::NotFound => e
  puts "There is no spoon."
  client.set "spoon", "Hello World!", :ttl => 10
end

Dissecting the script reveals the set and get process common in Couchbase:

  • The first two lines load the necessary libraries.

  • The next line opens up a connection to your Couchbase Server cluster. Remember that you need to connect to only one node in the cluster. There could be 20 or just one node in the cluster we are connecting to in the example.

    The definition is through a URL which should point to at least one node within your cluster. In this example, the localhost address is used. The remainder of the URL is the pools definition which remains the same.

  • The remainder of the script performs a retrieve and store operation. If the initial retrieve operation (for the document ID "spoon") fails, then we set the data into the database. If the document ID does exist, the script prints out the stored value.

Note

When connecting to the “default” bucket, you do not need to supply any credentials. For all other buckets, you must provide the username (the bucketname), even if no password is specified.

You can test this script out by running it from the command line. The first time you run it, it should output this error string:

shell> ruby hello-world.rb
There is no spoon.

The specified document does not exist in the database, but is added after the error string has been printed. The second time you run it, you should get the stored document value:

shell> ruby hello-world.rb
Hello World!

As an additional demonstration, the welcome string stored has been given an expiry value of 10 seconds. This means that if you wait longer than 10 seconds after you have stored the value, the value will be deleted from the database. If you wait more than 10 seconds from the first time you ran the script and execute the script again, it should output this error string:

shell> ruby hello-world.rb
There is no spoon.

For comparison, the same script, written in PHP:

<?php

$cb = new Couchbase("127.0.0.1:8091", "", "", "default");

$spoon = $cb->get("spoon");

if ($spoon) {
  echo "$spoon";
}
else {
  echo "There is no spoon.";
  $cb->set("spoon", "Hello World!", 10);
}

?>

Although this is a very basic example, it demonstrates the simplicity of retrieving and storing information into Couchbase Server, and the expiry of information.

The basic sequence shown here is one that will be replicated in many different places in your code, and it’s important to note the simplicity of the document-based interface. Note that there are no complex SQL statements to write, no need to worry about creating a structure to hold the data, or any need to worry about the size and complexity of the cluster holding the information.

Now let’s start looking at some more specific examples of different areas of the Couchbase Server interface.

Document Identifiers

Data is stored by recording a block of data against a given document ID. Because the document ID is the primary method for retrieving, and updating, the information that you store, some care needs to be taken to choose your document ID.

The document ID is required, and Couchbase will not automatically create one for you if an explicit ID is not specified. It should go without saying that document IDs should also be unique; you can only retrieve a single document back from a given ID. Use the same ID and you will overwrite the existing information.

There are a number of different strategies available:

Convert a field to a unique ID

Certain fields or values make good document IDs. For example, if you are storing session data, or even user data, you can use the unique session ID or the user’s email address as a suitable document ID. For other data, such as a user’s name (there are numerous Martin Browns out there), or recipe name (multiple varieties of Lasagne) a different ID structure should be used.

Store objects and sequences

You can make use of the raw numerical storage and the increment operation to create a sequence ID. Create a document, “recipe-sequence” and store a bare integer. Each time you create a new recipe, increment the value, and append that to the end of a string, such as “recipe_”. Because the increment operation is atomic, the number will increment and should never be repeated.

Use a UUID

There are numerous UUID solutions available in different languages that can be used to create unique identifier for the purposes for storing a document. UUIDs make the process of storing the information very simple and straightforward.

Of course, there is nothing to stop you using combinations or basic elements of all three. For example, you might store global data into “named” documents that you can easily identify and explicitly reference in your code, while storing user data in sequenced documents.

Another best practice approach, regardless of which document ID structure you use, is to either prefix your document ID for type. For example having user_98475894756 and object_3978547645. When storing JSON, consider using a type field as well to make it clear what the document type is.

Tip

Be careful with the length of your document IDs. All document metadata, including the document ID, is stored in memory at all times, so longer IDs make for a larger RAM footprint, which reduces the RAM available for caching the document data.

One area to be careful of is how you split up and define the document and data. We look at that in more detail in Chapter 3.

Time to Live (TTL)

As we saw in the opening example, you can store documents with an expiry time. All values stored within Couchbase Server have an optional expiry, or Time to Live (TTL) value. The expiry is designed for use on data that has a natural lifespan. For example, session data, or shopping baskets. The expiry time is configured using a simple numerical value and the value is interpreted differently according to its size:

  • Any value smaller than 30 days (i.e., 30 * 24 * 60 * 60) is taken as a relative value in seconds. For example, storing a document with an expiry of 600 will expire the document in 10 minutes.

  • Any value larger than this is taken as an absolute value from the epoch, also in seconds. For example, 1381921696 would be 16th October 2013.

  • A value of zero (or not supplying an expiry time), means that the document has no expiry. The document will need to be explicitly deleted to be removed from the database.

Setting the expiry time can be performed when initially storing the value, and updating it. For convenience, you can also get the value and “touch” the expiry to update it. This is useful for session and other information where retrieving the record also implies the data should stay around longer. You can also explicitly touch (without a get) the expiry on a document.

Storing Data

To start with, you need to store data into your database so that you can later retrieve it. There are two primary storage methods:

  • set(docid, docdata [, expiry])

    Store the specified document data against the document ID. This operation is an explicit set—that is, it will always store the document data against the ID you supply, irrespective of whether the document ID already exists, or what the corresponding value is.

    For example:

    $cb->set('message', 'Hello World!');

    This sets the document ‘message’ to ‘Hello World!’.

    The expiry is an optional value, although different client libraries expose this in different ways. In PHP it’s an optional argument which can be added to the end of our method call. For example, to add a ten second expiry:

    $cb->set('message', 'Hello World!', 10);

    You can test the success of the operation by checking the return value in PHP:

    if ($cb->set("spoon", "Hello World!", 10)) {
      echo "Message stored!";
    }

    In languages that support exceptions (Python, Ruby, .NET and Java) the exception system provides the information about whether individual operations succeed or note.

  • add(docid, docdata [, expiry])

    The add() function works different to set(). With add() the operation will fail if the document ID you specify already exists. For example, in the code below, the first operation will complete successfully, the second will fail:

    $cb->add('message', 'Hello World!');
    $cb->add('message', 'I pushed the button, but nothing happened!');

    The add() function is useful when you are storing data into the database for the first time. For example, consider storing user data using the email address as the document ID. Using set() would overwrite an old user record. Using add() would fail the registration and indicate the user should use the password recovery system.

    The add() function is also atomic, so multiple writers can be adding data to the cluster at the same time, and it’s safe to use within a multi-threaded environment.

Regardless of how you store the data, the actual format of the information you are storing is also important. For more details on designing data structures for document databases, see Chapter 3.

Retrieving Data

To retrieve a document from the database, you need only supply the document ID of the document that you want to retrieve. The retrieve operations are designed so that the get() operation will return an error if the corresponding document ID does not exist.

There is only one main function, get(). For example:

$message = $cb->get('message')

If the value comes back as undefined in PHP, then the specified document ID did not exist.

Retrieving in Bulk

There are times when you will have multiple different documents to collect. For example, if you’ve made use of Views, then you might have identified a number of documents that you want to retrieve. If you are using links within a document to other documents, you might want to load all of them at the same time.

As a rule, loading multiple documents in bulk is always faster and more efficient than loading the documents individually, largely because the latency in the requires and response has been removed by streaming the response of multiple documents.

Different libraries implement and expose this functionality in different ways, but the result is generally the same; either an array of the documents that you requested, or if supported by the language, a hash of the document IDs and corresponding document data. For example, in PHP the function is:

$ret = $cb->getMulti(array('recipe1','recipe2'));

This returns a PHP associative array, with each key/value pair as the document ID and document data. The document data is undefined if the requested document could not be retrieved.

Updating Data

Updating data in Couchbase Server should be performed carefully. You can only update entire documents, and because of the basic structure—updating document data against a document ID—the format is similar to that used when initially storing a document. In fact, you can use the set() function mentioned earlier. The problem is that you may not want to update a document that doesn’t already exist. For this, you can use the replace() method. This only updates the document data if the specified document ID already exists. For example:

$cb->replace("welcome-message", "Hello World!");

The above will fail until we use either add() or set() to create the document ID.

Generally the process for updating information is to load the record, update the information in it, and then use replace() or set() to save it back again:

$message = $cb->get('message');
$message = $message . " How are you today?";
$cb->replace('message',$message);

When updating JSON-based documents, the operation is the same; you must load the existing document before updating a field and saving it back:

$record	= array('name' => 'MC Brown', 'company' => 'Couchbase');
$cb->set('user', json_encode($record));

$newrecord = json_decode($cb->get('user'), true);
$newrecord["nickname"] = "MC";

$cb->replace('user', json_encode($newrecord));

Note

To ensure you are not overwriting data that might have been updated by another client, you should use the CAS operation.

The above example uses the built-in json_encode() and json_decode() functions to serialize an internal associative array within PHP into JSON for storage and back again.

Concurrent Updates

In any highly concurrent environment, particularly many modern websites, you must make sure that you do not try and update the same information from two or more clients simultaneously. This is even more important in an environment such as Couchbase Server where updates occur very quickly, and where you are storing larger documents that may contain a lot of compound information.

For example, consider the following scenario:

  1. Client A gets the value for the document “Martin”.

  2. Client B gets the value for the document “Martin”.

  3. Client A adds information to the document value and updates it.

  4. Client B adds information to the document value and updates it.

In the above sequence, the update by Client B will overwrite the information in the database, removing the data that Client A added.

To provide a solution to this, you can use the compare and swap (cas()) function, which uses an additional check value to ensure that the version of the document retrieved is the same as the one currently stored on the server.

The result is a change to the above sequence:

  1. Client A gets the value for the document “Martin” and the CAS ID.

  2. Client B gets the value for the document “Martin” and the CAS ID.

  3. Client A adds information to the document value and updates it, using the CAS ID as a check. The document is updated.

  4. Client B adds information to the document value and tries to update it using the CAS ID. The operation fails, because the cached CAS ID on client B is now different from the CAS ID on the server after the update by client A.

CAS therefore supports an additional level of checking and verifies that the information you are updating matches the copy of the information you originally retrieved. CAS enforces what Couchbase calls optimistic locking, that is, we hope that we are the only client performing an update that has the right CAS value, and that all clients always use a CAS function to do updates.

Within your code, CAS is a function just like the update() function. Depending on your environment, you may need to use a special get function (gets()) that obtains both the document information and CAS value.

For example, within Java you would update an existing document through CAS first by getting the value and stored CAS value, and then using the cas() method to update the document:

$value = client->get("customer", NULL, $casvalue);
$response = client->cas($casvalue, "customer", "new string value");

The limitation of using CAS is that it is not enforceable at a database level. If you want to use it for all the update operations, you must explicitly use it over the standard document update functions across your entire application. Care should be taken to use the operation in the right place. A CAS update is slower than a simple set operation, and to get the best performance you should use the fastest operation appropriate to your application and update requirements.

Server-side Updates

In addition to all the above operations for storing and retrieving information, there are a small number of server-side operations that update the stored data. They cannot be used with JSON documents, because the document information is not parsed, but they can be used for those documents where you are storing raw string or integer data. Here are some:

Increment

When an integer value has been stored, it increments the stored value, either by one, or by the specific increment value. This is particularly useful if you are storing counters (for example, a scorecard, or a count of the number of visitors to a website or object), or using the values for sequences within the code.

For example, to increment by one:

$cb->set('counter',10);
      $cb->increment('counter');

To increment by 10:

$cb->set('counter',10);
      $cb->increment('counter', 10);

Counters like this are still supported by the view system for indexing the data by using the document metadata. We’ll look at this in more detail when writing some sample views.

Decrement

When an integer value has been stored, it decrements the stored value, either by one, or by the specific increment value.

Append

Append data to the end of the stored document. Useful for updating strings, or even text-delimited arrays of information. For example:

$cb->set('message','Hello');
      $cb->append('message', ' World!');

Will populate the typical message. For lists, you can append a fixed formatted string. For example:

$cb->set('userlist','martin,');
$cb->append('userlist', 'stuart,');
$cb->append('userlist', 'sharon,');

To get a list of users, you can access the ‘userlist’ record, and split them by commas, and ignore the blank last item.

Prepend

Prepend data to the beginning of the stored document.

Remember, these operations are atomic, i.e., they either succeed or fail and it is impossible for multiple-clients calling these operations to overwrite or corrupt the stored information.

Asynchronous Operations

Some client libraries support asynchronous requests, where you can ask for a value and then fetch it later. This is useful if you are building an application that builds an UI or has other operations. For example, you might deliberately take advantage of the asynchronous nature like this:

  1. (Client) Request retrieval of an item

  2. (Client) Builds User Interface (unpopulated)

  3. (Server) Retrieves Item; sends to Client

  4. (Client) Checks whether requested item could be obtained and displays it

You can request the data to be retrieved, perform the UI setup or formatting, and then retrieve the values. This is particularly useful if some of the values have been stored on disk and therefore take slightly longer to retrieve than those stored in the RAM cache.

Additionally, they can be used during the storage process to store items in the background while other operations are taking place, such as saving a comment while reloading the existing list of comments.

In PHP, the method is called getDelayed(), and it accepts one (or more) keys to be retrieved. You can optionally elect to either use a callback function for each document ID/document pair that are returned, or you can use the fetch() or fetchAll() functions to get the information back.

For example, using the callback method, you could format the returned documents using code similar to the following:

$format_recipe = function($key, $value) {
   return ('<li>' . $value['title'] . '</li>');
   };

$ret = $cb->getDelayed(array('recipe1','recipe2'),0,$format_recipe);

The callback function is supplied two arguments, the document ID and document of each returned item.

Using fetch() you can achieve the same result:

$ret = $cb->getDelayed(array('recipe1','recipe2'),0,$format_recipe);

while ($ret = $cb->fetch()) {
echo('<li>' . $ret[value]['title'] . '</li>');

In this case we’ve retrieved each individual document from the database.

Pessimistic Locking

Optimistic locking in Couchbase tries to prevent the updating of a document without the right CAS value. It’s optimistic because it doesn’t protect against the fact that a client can still update the document by using a straightforward set() function to update the document value. Pessimistic locking within Couchbase relies on implementing an explicit locking based mechanism that is ultimately unlocked by updating the value using a CAS compatible function, by explicitly unlocking (using the CAS value), or allowing the lock to optionally time out. With pessimistic locking, it’s impossible to update without using a CAS compatible update function.

Usually locking is used as a more explicit method of enforcing the concurrency elements. While a lock is in place, other clients can obtain the value, but they cannot update it without using a CAS compatible update and having the right CAS value. This enables you to enforce the use of CAS for updates:

To use the lock, you explicitly perform a get operation with an embedded lock request:

$recipe = $cb->getAndLock('recipe1', &$cas);

$recipe = 'new value';

# This will fail, because we are not supplying the CAS value

$cb->set('recipe1', $recipe);

# This will succeed and then unlock document

$cb->set('recipe1', $recipe, 0, $cas);

You can also request a lock with a lock expiry. This locks the document for the specified duration, preventing updates until the lock expiry times out.

The lock can also be explicitly released by using the unlock() method:

$cb->unlock('recipe1',$cas);

Note

There’s a default timeout (30 seconds) on the lock to prevent deadlock situations where an item has been locked but never explicitly unlocked.

Deleting Data

There will always be occasions when you want to explicitly delete information from the database, and the delete() method will handle these requests. It accepts the document ID, and an optional expiry value. To delete an item immediately, call delete():

$cb->delete('message');

Once the item has been deleted, further operations that access the document will operate as if the document had never existed, including get() and replace().

Get Developing with Couchbase Server now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.