Chapter 4. Getting Data In and Out
With Couchbase Server installed, and your chosen client library up and running, it is time to start storing and retrieving information into the database. All of the client libraries use the same basic operations, although they may have different function or methods names. Despite the differences that exist between individual client libraries, the basic operations and how and why you use them are the same.
A basic list of the key Create, Retrieve, Update, and Delete (CRUD) operations is shown in Table 4-1.
Operation | Description |
add(id, document [, expiry]) | Add an item if ID doesn’t already exist |
set(id, document [, expiry]) | Store a document with ID |
replace(id, document [, expiry]) | Update the document for an existing ID |
cas(id, document, check [, expiry]) | Update the document for an existing ID providing the check matches |
get(id) | Get the specified document |
incr(id [, offset]) | Increment the document value by offset |
decr(id [, offset] | Decrement the document value by offset |
append(id, value) | Append the content to the end of the current document |
prepend(id, value) | Prepend the content to the beginning of the current document |
delete(id) | Delete the specified document |
There are, though, common use cases and sequences with the different operations that can be used to achieve different processes and solutions. Let’s look at some of these in the context of a typical web application.
Basic Interface
To start, let’s look at a basic Ruby script that stores some data, and retrieves it, and handles the basic responses. A sample program, hello-world.rb, is shown in Example 4-1.
require 'rubygems' require 'couchbase' client = Couchbase.new "http://127.0.0.1:8091/pools/default" client.quiet = false begin spoon = client.get "spoon" puts spoon rescue Couchbase::Error::NotFound => e puts "There is no spoon." client.set "spoon", "Hello World!", :ttl => 10 end
Dissecting the script reveals the set
and
get
process common in Couchbase:
The first two lines load the necessary libraries.
The next line opens up a connection to your Couchbase Server cluster. Remember that you need to connect to only one node in the cluster. There could be 20 or just one node in the cluster we are connecting to in the example.
The definition is through a URL which should point to at least one node within your cluster. In this example, the
localhost
address is used. The remainder of the URL is the pools definition which remains the same.The remainder of the script performs a retrieve and store operation. If the initial retrieve operation (for the document ID
"spoon"
) fails, then we set the data into the database. If the document ID does exist, the script prints out the stored value.
Note
When connecting to the “default
” bucket, you do not need to supply
any credentials. For all other buckets, you must provide the username (the bucketname), even
if no password is specified.
You can test this script out by running it from the command line. The first time you run it, it should output this error string:
shell> ruby hello-world.rb There is no spoon.
The specified document does not exist in the database, but is added after the error string has been printed. The second time you run it, you should get the stored document value:
shell> ruby hello-world.rb Hello World!
As an additional demonstration, the welcome string stored has been given an expiry value of 10 seconds. This means that if you wait longer than 10 seconds after you have stored the value, the value will be deleted from the database. If you wait more than 10 seconds from the first time you ran the script and execute the script again, it should output this error string:
shell> ruby hello-world.rb There is no spoon.
For comparison, the same script, written in PHP:
<?php $cb = new Couchbase("127.0.0.1:8091", "", "", "default"); $spoon = $cb->get("spoon"); if ($spoon) { echo "$spoon"; } else { echo "There is no spoon."; $cb->set("spoon", "Hello World!", 10); } ?>
Although this is a very basic example, it demonstrates the simplicity of retrieving and storing information into Couchbase Server, and the expiry of information.
The basic sequence shown here is one that will be replicated in many different places in your code, and it’s important to note the simplicity of the document-based interface. Note that there are no complex SQL statements to write, no need to worry about creating a structure to hold the data, or any need to worry about the size and complexity of the cluster holding the information.
Now let’s start looking at some more specific examples of different areas of the Couchbase Server interface.
Document Identifiers
Data is stored by recording a block of data against a given document ID. Because the document ID is the primary method for retrieving, and updating, the information that you store, some care needs to be taken to choose your document ID.
The document ID is required, and Couchbase will not automatically create one for you if an explicit ID is not specified. It should go without saying that document IDs should also be unique; you can only retrieve a single document back from a given ID. Use the same ID and you will overwrite the existing information.
There are a number of different strategies available:
- Convert a field to a unique ID
Certain fields or values make good document IDs. For example, if you are storing session data, or even user data, you can use the unique session ID or the user’s email address as a suitable document ID. For other data, such as a user’s name (there are numerous Martin Browns out there), or recipe name (multiple varieties of Lasagne) a different ID structure should be used.
- Store objects and sequences
You can make use of the raw numerical storage and the increment operation to create a sequence ID. Create a document, “recipe-sequence” and store a bare integer. Each time you create a new recipe, increment the value, and append that to the end of a string, such as “
recipe_
”. Because the increment operation is atomic, the number will increment and should never be repeated.- Use a UUID
There are numerous UUID solutions available in different languages that can be used to create unique identifier for the purposes for storing a document. UUIDs make the process of storing the information very simple and straightforward.
Of course, there is nothing to stop you using combinations or basic elements of all three. For example, you might store global data into “named” documents that you can easily identify and explicitly reference in your code, while storing user data in sequenced documents.
Another best practice approach, regardless of which document ID structure you use, is to
either prefix your document ID for type. For example having user_98475894756
and object_3978547645
. When
storing JSON, consider using a type
field as well to make
it clear what the document type is.
Tip
Be careful with the length of your document IDs. All document metadata, including the document ID, is stored in memory at all times, so longer IDs make for a larger RAM footprint, which reduces the RAM available for caching the document data.
One area to be careful of is how you split up and define the document and data. We look at that in more detail in Chapter 3.
Time to Live (TTL)
As we saw in the opening example, you can store documents with an expiry time. All values stored within Couchbase Server have an optional expiry, or Time to Live (TTL) value. The expiry is designed for use on data that has a natural lifespan. For example, session data, or shopping baskets. The expiry time is configured using a simple numerical value and the value is interpreted differently according to its size:
Any value smaller than 30 days (i.e., 30 * 24 * 60 * 60) is taken as a relative value in seconds. For example, storing a document with an expiry of 600 will expire the document in 10 minutes.
Any value larger than this is taken as an absolute value from the epoch, also in seconds. For example, 1381921696 would be 16th October 2013.
A value of zero (or not supplying an expiry time), means that the document has no expiry. The document will need to be explicitly deleted to be removed from the database.
Setting the expiry time can be performed when initially storing the value, and updating it. For convenience, you can also get the value and “touch” the expiry to update it. This is useful for session and other information where retrieving the record also implies the data should stay around longer. You can also explicitly touch (without a get) the expiry on a document.
Storing Data
To start with, you need to store data into your database so that you can later retrieve it. There are two primary storage methods:
set(docid, docdata [, expiry])
Store the specified document data against the document ID. This operation is an explicit set—that is, it will always store the document data against the ID you supply, irrespective of whether the document ID already exists, or what the corresponding value is.
For example:
$cb->set('message', 'Hello World!');
This sets the document ‘message’ to ‘Hello World!’.
The expiry is an optional value, although different client libraries expose this in different ways. In PHP it’s an optional argument which can be added to the end of our method call. For example, to add a ten second expiry:
$cb->set('message', 'Hello World!', 10);
You can test the success of the operation by checking the return value in PHP:
if ($cb->set("spoon", "Hello World!", 10)) { echo "Message stored!"; }
In languages that support exceptions (Python, Ruby, .NET and Java) the exception system provides the information about whether individual operations succeed or note.
add(docid, docdata [, expiry])
The
add()
function works different toset()
. Withadd()
the operation will fail if the document ID you specify already exists. For example, in the code below, the first operation will complete successfully, the second will fail:$cb->add('message', 'Hello World!'); $cb->add('message', 'I pushed the button, but nothing happened!');
The
add()
function is useful when you are storing data into the database for the first time. For example, consider storing user data using the email address as the document ID. Usingset()
would overwrite an old user record. Usingadd()
would fail the registration and indicate the user should use the password recovery system.The
add()
function is also atomic, so multiple writers can be adding data to the cluster at the same time, and it’s safe to use within a multi-threaded environment.
Regardless of how you store the data, the actual format of the information you are storing is also important. For more details on designing data structures for document databases, see Chapter 3.
Retrieving Data
To retrieve a document from the database, you need only supply the document ID of the
document that you want to retrieve. The retrieve operations are designed so that the get()
operation will return an error if the corresponding document
ID does not exist.
There is only one main function, get()
. For
example:
$message = $cb->get('message')
If the value comes back as undefined in PHP, then the specified document ID did not exist.
Retrieving in Bulk
There are times when you will have multiple different documents to collect. For example, if you’ve made use of Views, then you might have identified a number of documents that you want to retrieve. If you are using links within a document to other documents, you might want to load all of them at the same time.
As a rule, loading multiple documents in bulk is always faster and more efficient than loading the documents individually, largely because the latency in the requires and response has been removed by streaming the response of multiple documents.
Different libraries implement and expose this functionality in different ways, but the result is generally the same; either an array of the documents that you requested, or if supported by the language, a hash of the document IDs and corresponding document data. For example, in PHP the function is:
$ret = $cb->getMulti(array('recipe1','recipe2'));
This returns a PHP associative array, with each key/value pair as the document ID and document data. The document data is undefined if the requested document could not be retrieved.
Updating Data
Updating data in Couchbase Server should be performed carefully. You can only update
entire documents, and because of the basic structure—updating document data against a document
ID—the format is similar to that used when initially storing a document. In fact, you can use
the set()
function mentioned earlier. The problem is that
you may not want to update a document that doesn’t already exist. For this, you can use the
replace()
method. This only updates the document data if
the specified document ID already exists. For example:
$cb->replace("welcome-message", "Hello World!");
The above will fail until we use either add()
or set()
to create the document ID.
Generally the process for updating information is to load the
record, update the information in it, and then use
replace()
or set()
to save
it back again:
$message = $cb->get('message'); $message = $message . " How are you today?"; $cb->replace('message',$message);
When updating JSON-based documents, the operation is the same; you must load the existing document before updating a field and saving it back:
$record = array('name' => 'MC Brown', 'company' => 'Couchbase'); $cb->set('user', json_encode($record)); $newrecord = json_decode($cb->get('user'), true); $newrecord["nickname"] = "MC"; $cb->replace('user', json_encode($newrecord));
Note
To ensure you are not overwriting data that might have been updated by another client, you should use the CAS operation.
The above example uses the built-in
json_encode()
and
json_decode()
functions to serialize an
internal associative array within PHP into JSON for storage and
back again.
Concurrent Updates
In any highly concurrent environment, particularly many modern websites, you must make sure that you do not try and update the same information from two or more clients simultaneously. This is even more important in an environment such as Couchbase Server where updates occur very quickly, and where you are storing larger documents that may contain a lot of compound information.
For example, consider the following scenario:
Client A gets the value for the document “Martin”.
Client B gets the value for the document “Martin”.
Client A adds information to the document value and updates it.
Client B adds information to the document value and updates it.
In the above sequence, the update by Client B will overwrite the information in the database, removing the data that Client A added.
To provide a solution to this, you can use the compare and swap
(cas()
) function,
which uses an additional check value to ensure that the version of
the document retrieved is the same as the one currently stored on
the server.
The result is a change to the above sequence:
Client A gets the value for the document “Martin” and the CAS ID.
Client B gets the value for the document “Martin” and the CAS ID.
Client A adds information to the document value and updates it, using the CAS ID as a check. The document is updated.
Client B adds information to the document value and tries to update it using the CAS ID. The operation fails, because the cached CAS ID on client B is now different from the CAS ID on the server after the update by client A.
CAS therefore supports an additional level of checking and verifies that the information you are updating matches the copy of the information you originally retrieved. CAS enforces what Couchbase calls optimistic locking, that is, we hope that we are the only client performing an update that has the right CAS value, and that all clients always use a CAS function to do updates.
Within your code, CAS is a function just like the
update()
function. Depending on your
environment, you may need to use a special get function
(gets()
) that obtains
both the document information and CAS value.
For example, within Java you would update an existing document
through CAS first by getting the value and stored CAS value, and
then using the cas()
method to update the document:
$value = client->get("customer", NULL, $casvalue); $response = client->cas($casvalue, "customer", "new string value");
The limitation of using CAS is that it is not enforceable at a database level. If you want to use it for all the update operations, you must explicitly use it over the standard document update functions across your entire application. Care should be taken to use the operation in the right place. A CAS update is slower than a simple set operation, and to get the best performance you should use the fastest operation appropriate to your application and update requirements.
Server-side Updates
In addition to all the above operations for storing and retrieving information, there are a small number of server-side operations that update the stored data. They cannot be used with JSON documents, because the document information is not parsed, but they can be used for those documents where you are storing raw string or integer data. Here are some:
- Increment
When an integer value has been stored, it increments the stored value, either by one, or by the specific increment value. This is particularly useful if you are storing counters (for example, a scorecard, or a count of the number of visitors to a website or object), or using the values for sequences within the code.
For example, to increment by one:
$cb->set('counter',10); $cb->increment('counter');
To increment by 10:
$cb->set('counter',10); $cb->increment('counter', 10);
Counters like this are still supported by the view system for indexing the data by using the document metadata. We’ll look at this in more detail when writing some sample views.
- Decrement
When an integer value has been stored, it decrements the stored value, either by one, or by the specific increment value.
- Append
Append data to the end of the stored document. Useful for updating strings, or even text-delimited arrays of information. For example:
$cb->set('message','Hello'); $cb->append('message', ' World!');
Will populate the typical message. For lists, you can append a fixed formatted string. For example:
$cb->set('userlist','martin,'); $cb->append('userlist', 'stuart,'); $cb->append('userlist', 'sharon,');
To get a list of users, you can access the ‘userlist’ record, and split them by commas, and ignore the blank last item.
- Prepend
Prepend data to the beginning of the stored document.
Remember, these operations are atomic, i.e., they either succeed or fail and it is impossible for multiple-clients calling these operations to overwrite or corrupt the stored information.
Asynchronous Operations
Some client libraries support asynchronous requests, where you can ask for a value and then fetch it later. This is useful if you are building an application that builds an UI or has other operations. For example, you might deliberately take advantage of the asynchronous nature like this:
(Client) Request retrieval of an item
(Client) Builds User Interface (unpopulated)
(Server) Retrieves Item; sends to Client
(Client) Checks whether requested item could be obtained and displays it
You can request the data to be retrieved, perform the UI setup or formatting, and then retrieve the values. This is particularly useful if some of the values have been stored on disk and therefore take slightly longer to retrieve than those stored in the RAM cache.
Additionally, they can be used during the storage process to store items in the background while other operations are taking place, such as saving a comment while reloading the existing list of comments.
In PHP, the method is called getDelayed()
,
and it accepts one (or more) keys to be retrieved. You can
optionally elect to either use a callback function for each
document ID/document pair that are returned, or you can use the
fetch()
or fetchAll()
functions to get the information back.
For example, using the callback method, you could format the returned documents using code similar to the following:
$format_recipe = function($key, $value) { return ('<li>' . $value['title'] . '</li>'); }; $ret = $cb->getDelayed(array('recipe1','recipe2'),0,$format_recipe);
The callback function is supplied two arguments, the document ID and document of each returned item.
Using fetch()
you can achieve the same
result:
$ret = $cb->getDelayed(array('recipe1','recipe2'),0,$format_recipe); while ($ret = $cb->fetch()) { echo('<li>' . $ret[value]['title'] . '</li>');
In this case we’ve retrieved each individual document from the database.
Pessimistic Locking
Optimistic locking in Couchbase tries to prevent the updating of a document without the
right CAS value. It’s optimistic because it doesn’t protect against the fact that a client can
still update the document by using a straightforward set()
function to update the document value. Pessimistic locking within
Couchbase relies on implementing an explicit locking based mechanism that is ultimately
unlocked by updating the value using a CAS compatible function, by explicitly unlocking (using
the CAS value), or allowing the lock to optionally time out. With pessimistic locking, it’s
impossible to update without using a CAS compatible update function.
Usually locking is used as a more explicit method of enforcing the concurrency elements. While a lock is in place, other clients can obtain the value, but they cannot update it without using a CAS compatible update and having the right CAS value. This enables you to enforce the use of CAS for updates:
To use the lock, you explicitly perform a get operation with an embedded lock request:
$recipe = $cb->getAndLock('recipe1', &$cas); $recipe = 'new value'; # This will fail, because we are not supplying the CAS value $cb->set('recipe1', $recipe); # This will succeed and then unlock document $cb->set('recipe1', $recipe, 0, $cas);
You can also request a lock with a lock expiry. This locks the document for the specified duration, preventing updates until the lock expiry times out.
The lock can also be explicitly released by using the
unlock()
method:
$cb->unlock('recipe1',$cas);
Note
There’s a default timeout (30 seconds) on the lock to prevent deadlock situations where an item has been locked but never explicitly unlocked.
Deleting Data
There will always be occasions when you want to explicitly delete
information from the database, and the delete()
method will handle these requests. It accepts the document ID, and
an optional expiry value. To delete an item immediately, call
delete()
:
$cb->delete('message');
Once the item has been deleted, further operations that access the
document will operate as if the document had never existed,
including get()
and
replace()
.
Get Developing with Couchbase Server now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.