You are previewing Node: Up and Running.

Node: Up and Running

Cover of Node: Up and Running by Tom Hughes-Croucher... Published by O'Reilly Media, Inc.
O'Reilly logo

Patterns

Event-driven programming is different from procedural programming. The easiest way to learn it is to practice routine patterns that have been discovered by previous generations of programmers. That is the purpose of this section.

Before we launch into patterns, we’ll take a look at what is really happening behind various programming styles to give the patterns some context. Most of this section will focus on I/O, because, as discussed in the previous section, event-driven programming is focused on solving problems with I/O. When it is working with data in memory that doesn’t require I/O, Node can be completely procedural.

The I/O Problem Space

We’ll start by looking at the types of I/O required in efficient systems. These will be the basis of our patterns.

The first obvious distinction to look at is serial versus parallel I/O. Serial is obvious: do this I/O, and after it is finished, do that I/O. Parallel is more complicated to implement but also easy to understand: do this I/O and that I/O at the same time. The important point here is that ordering is normally considered implicit in serial tasks, but parallel tasks could return in any order.

Groups of serial and parallel work can also be combined. For example, two groups of parallel requests could execute serially: do this and that together, then do other and another together.

In Node, we assume that all I/O has unbounded latency. This means that any I/O tasks could take from 0 to infinite time. We don’t know, and can’t assume, how long these tasks take. So instead of waiting for them, we use placeholders (events), which then fire callbacks when the I/O happens. Because we have assumed unbounded latency, it’s easy to perform parallel tasks. You simply make a number of calls for various I/O tasks. They will return whenever they are ready, in whatever order that happens to be. Ordered serial requests are also easy to make by nesting or referencing callbacks together so that the first callback will initiate the second I/O request, the second callback will initiate the third, and so on. Even though each request is asynchronous and doesn’t block the event loop, the requests are made in serial. This pattern of ordered requests is useful when the results of one I/O operation have to inform the details of the next I/O request.

So far, we have two ways to do I/O: ordered serial requests and unordered parallel requests. Ordered parallel requests are also a useful pattern; they happen when we allow the I/O to take place in parallel, but we deal with the results in a particular sequence. Unordered serial I/O offers no particular benefits, so we won’t consider it as a pattern.

Unordered parallel I/O

Let’s start with unordered parallel I/O (Example 3-3) because it’s by far the easiest to do in Node. In fact, all I/O in Node is unordered parallel by default. This is because all I/O in Node is asynchronous and nonblocking. When we do any I/O, we simply throw the request out there and see what happens. It’s possible that all the requests will happen in the order we made them, but maybe they won’t. When we talk about unordered, we don’t mean randomized, but simply that there is no guaranteed order.

Example 3-3. Unordered parallel I/O in Node

fs.readFile('foo.txt', 'utf8', function(err, data) {
  console.log(data);
};
fs.readFile('bar.txt', 'utf8', function(err, data) {
  console.log(data);
};

Simply making I/O requests with callbacks will create unordered parallel I/O. At some point in the future, both of these callbacks will fire. Which happens first is unknown, and either one could return an error rather than data without affecting the other request.

Ordered serial I/O

In this pattern, we want to do some I/O (unbounded latency) tasks in sequence. Each previous task must be completed before the next task is started. In Node, this means nesting callbacks so that the callback from each task starts the next task, as shown in Example 3-4.

Example 3-4. Nesting callbacks to produce serial requests

server.on('request', function(req, res) {
  //get session information from memcached
  memcached.getSession(req, function(session) {
    //get information from db
    db.get(session.user, function(userData) {
      //some other web service call
      ws.get(req, function(wsData) {
        //render page
        page = pageRender(req, session, userData, wsData);
        //output the response
        res.write(page);
      });
    });
  });
});

Although nesting callbacks allows easy creation of ordered serial I/O, it also creates so-called “pyramid” code.[6] This code can be hard to read and understand, and as a consequence, hard to maintain. For instance, a glance at Example 3-4 doesn’t reveal that the completion of the memcached.getSession request launches the db.get request, that the completion of the db.get request launches the ws.get request, and so on. There are a few ways to make this code more readable without breaking the fundamental ordered serial pattern.

First, we can continue to use inline function declarations, but we can name them, as in Example 3-5. This makes debugging a lot easier as well as giving an indication of what the callback is going to do.

Example 3-5. Naming function calls in callbacks

server.on('request', getMemCached(req, res) {
  memcached.getSession(req, getDbInfo(session) {
    db.get(session.user, getWsInfo(userData) {
      ws.get(req, render(wsData) {
        //render page
        page = pageRender(req, session, userData, wsData);
        //output the response
        res.write(page);
      });
    });
  });
});

Another approach that changes the style of code is to use declared functions instead of just anonymous or named ones. This removes the natural pyramid seen in the other approaches, which shows the order of execution, but it also breaks the code out into more manageable chunks (see Example 3-6).

Example 3-6. Using declared functions to separate out code

var render = function(wsData) {
  page = pageRender(req, session, userData, wsData);
}; 

var getWsInfo = function(userData) {
  ws.get(req, render);
};

var getDbInfo = function(session) {
  db.get(session.user, getWsInfo);
};

var getMemCached = function(req, res) {
  memcached.getSession(req, getDbInfo);
};

The code shown in this example won’t actually work. The original nested code used closures to encapsulate some variables and make them available to subsequent functions. Hence, declared functions can be good when state doesn’t need to be maintained across three or more callbacks. If you need only the information from the last callback in order to do the next one, it works well. It can be a lot more readable (especially with documentation) than a huge lump of nested functions.

There are, of course, ways of passing data around between functions. Mostly it comes down to using the features of the JavaScript language itself. JavaScript has functional scope, which means that when you declare var within a function, the variable becomes local to that function. However, simply having { and } does not limit the scope of a variable. This allows us to define variables in the outer callback that can be accessed by the inner callbacks even when the outer callbacks have “closed” by returning. When we nest callbacks, we are implicitly binding the variables from all the previous callbacks into the most recently defined callback. It just turns out that lots of nesting isn’t very easy to work with.

We can still perform the flattening refactoring we did, but we should do it within the shared scope of the original request, to form a closure environment around all the callbacks we want to do. This way, all the callbacks relating to that initial request can be encapsulated and can share state via variables in the encapsulating callback (Example 3-7).

Example 3-7. Encapsulating within a callback

       server.on('request', function(req, res) {

  var render = function(wsData) {
    page = pageRender(req, session, userData, wsData);
  };

  var getWsInfo = function(userData) {
    ws.get(req, render);
  };

  var getDbInfo = function(session) {
    db.get(session.user, getWsInfo);
  };

  var getMemCached = function(req, res) {
    memcached.getSession(req, getDbInfo);
  };

}

Not only does this approach organize code in a logical way, but it also allows you to flatten a lot of the callback hell.

Other organizational innovations are also possible. Sometimes there is code you want to reuse across many functions. This is the province of middleware. There are many ways to do middleware. One of the most popular in Node is the model used by the Connect framework, which could be said to be based on Rack from the Ruby world. The general idea behind its implementation is that we pass around some variables that represent not only the state but also the methods of interacting with that state.

In JavaScript, objects are passed by reference. That means when you call myFunction(someObject), any changes you make to someObject will affect all copies of someObject in your current functional scope. This is potentially tricky, but gives you some great powers if you are careful about any side effects created. Side effects are largely dangerous in asynchronous code. When something modifies an object used by a callback, it can often be very difficult to figure out when that change happened because it happens in a nonlinear order. If you use the ability to change objects passed by arguments, be considerate of where those objects are going to be used.

The basic idea is to take something that represents the state and pass it between all functions that need to act on that state. This means that all the things acting on the state need to have the same interface so they can pass between themselves. This is why Connect (and therefore Express) middleware all takes the form function(req, res, next). We discuss Connect/Express middleware in more detail in Chapter 7.

In the meantime, let’s look at the basic approach, shown in Example 3-8. When we share objects between functions, earlier functions in the call stack can affect the state of those objects such that the later objects utilize the changes.

Example 3-8. Passing changes between functions

       var AwesomeClass = function() {
  this.awesomeProp = 'awesome!'
  this.awesomeFunc = function(text) {
    console.log(text + ' is awesome!')
  }
}

var awesomeObject = new AwesomeClass()

function middleware(func) {
  oldFunc = func.awesomeFunc
  func.awesomeFunc = function(text) {
    text = text + ' really'
    oldFunc(text)
  }
}

function anotherMiddleware(func) {
  func.anotherProp = 'super duper' 
}

function caller(input) {
  input.awesomeFunc(input.anotherProp)
}

middleware(awesomeObject)
anotherMiddleware(awesomeObject)
caller(awesomeObject)


[6] This term was coined by Tim Caswell.

The best content for your career. Discover unlimited learning on demand for around $1/day.