Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

A guest post by Pedro Teixeira, a geek, programmer, freelancer, and entrepreneur. He is the author of some Node.js modules, the Node Tuts screencast show, some books about Node.js and overall fervent proclaimer of the Node.js creed. He is also the Co-founder and Partner of The Node Firm, and the organizer of the Lisbon JavaScript Conference.

The UNIX Philosophy

The UNIX Operating System was created at Bell Labs in 1969. At first it was adopted by the academic world and later by the industry, becoming the root of the majority of modern operating systems. In this post I will point out the similarities between UNIX and Node.js.

In 1964 Douglas McIlroy, one of the UNIX authors, wrote a memo where you can read:

“We should have some ways of connecting programs like garden hose — screw in another segment when it becomes necessary to massage data in another way. This is the way of IO also.”

In UNIX, a program is a small unit of work that does one thing and does it well. You can then compose two or more of these together to create another reusable program.

As a simple example, let’s say that we have a compressed text file named archive.txt.gz, and that we want to count the number of occurrences of the word “babbel”. This is how we can achieve this in UNIX:

This command makes use of the pipeline construct (which was originally developed by Douglas McIlroy himself, fulfilling his prophecy), which can be boiled down to the following:

Each program has two main streams of data: the input stream and the output stream. You can pipe the output stream of one program into the input stream of another program by separating them using the pipe (|) command.

Even if this archive.txt.gz file is large, this poses no problem to our simple UNIX program: the cat program outputs the contents of the file in small chunks. Each chunk is sent by the pipe to the gunzip program, which also outputs the uncompressed text chunk by chunk. The passing and transformation of the chunks continues down the chain, and at the end we have the wc program that counts the occurrences and outputs the total when the input stream closes.

The Node.js Way

Here is a Node.js program that achieves the same goals:

In Node.js, instead of programs, you have streams. A stream can be readable, writable, or both. A transform stream is one of these streams that does a transformation on the input stream, outputting the result. In this example, we create one readable stream, one writable stream and five transform streams and glue them together at the very end.

To begin, we create a file readable stream that reads from the source zipped file. This stream emits this file in chunks, chunk by chunk until the file ends.

After creating the readable stream, we use the zlib module to create an unzip transform stream to which you can write a zip archive and outputs the original file.

From here on we use a third-party module named through, which helps us create a transform stream given a transformation function. Here we use it to create one stream that strips punctuation, another that separates words, one that filters the words and another that counts them.

In between we also use the split module to create a stream that splits the input by a newline character, which prevents having the same word span two different chunks.

Then we wire up all of these streams using the source.pipe(destination) construct, which does a similar thing as the UNIX pipe: each chunk read from the source stream is written to the destination stream.

To make this example work you need to install the split and the through modules using NPM:

You can then save this script into a file named count_babel.js and run it:

Conclusion

As in the UNIX example, even if your source file is big, Node.js does not load all of the file into memory. Instead, each small chunk is individually sent through the pipeline, minimizing latency and memory consumption.

By splitting the algorithm into many streams, you are able to compose your program into several independent units. By using, supporting and promoting this abstraction at its core, Node.js has also sprung an ecosystem of third-party stream modules that can be picked and combined in an efficient way.

Streams are one of the many techniques for writing scalable and performant Node.js programs. Learn these and new skills directly from renowned Node.js developers by watching the workshop and sessions from the O’Reilly Fluent Conference.

See below for more Node.js resources from Safari Books Online.

Read these titles on Safari Books Online

Not a subscriber? Sign up for a free trial.

As Node.js’s popularity grows, the “Holy Grail of Web Development” is within reach—writing application code once and executing it both on the server and in the browser! The Node.js Sessions–2012 Fluent Conference allows you to view workshops and sessions from O’Reilly’s 2012 Fluent Conference that deal specifically with server-side development. Quickly discover how Node.js is reshaping the Web. You’ll learn new skills you can apply immediately.
Node.js is a powerful and popular new framework for writing scalable network programs using JavaScript. Professional Node.js begins with an overview of Node.js and then quickly dives into the code, core concepts, and APIs. In-depth coverage pares down the essentials to cover debugging, unit testing, and flow control so that you can start building and testing your own modules right away.
This book shows you how to transfer your JavaScript skills to server side programming. With simple examples and supporting code, Node Cookbook talks you through various server side scenarios often saving you time, effort, and trouble by demonstrating best practices and showing you how to avoid security faux pas.
Node Web Development gives you an excellent starting point straight into the heart of developing server side web applications with node. You will learn, through practical examples, how to use the HTTP Server and Client objects, the Connect and Express application frameworks, the algorithms for asynchronous execution, and use both SQL and MongoDB databases.

About the author

Pedro Teixeira is a geek, programmer, freelancer, and entrepreneur. Author of some Node.js modules, the Node Tuts screencast show, some books about Node.js and overall fervent proclaimer of the Node.js creed. He is also the Co-founder and Partner of The Node Firm, and the organizer of the Lisbon JavaScript Conference.

Tags: Douglas McIlroy, Node.js, pipe, Streams, unix,

Comments are closed.