Introduction: The Hallowed Halls

“Our great computers Fill the hallowed halls”
Neil Peart, Rush, 2112

Quite soon, the world’s information infrastructure is going to reach a level of scale and complexity that will force scientists and engineers to think about it in an entirely new way. The familiar notions of command and control, which have held temporary dominion over our systems, are now being thwarted by the realities of a faster, denser world of communication, a world where choice, variety, and indeterminism rule. The myth of the machine, that does exactly what we tell it, has come to an end.

Many of the basic components of information infrastructure, including computers, storage devices, networks, and so forth, are by now relatively familiar; however, each generation of these devices adds new ways to adapt to changing needs, and what happens inside them during operation has influences that originate from all over the planet. This makes their behaviour far from well understood even by the engineers who build and use them. How does that affect predictability, their usefulness?

It is now fair to draw a parallel between the structures we build to deliver information, and the atomic structure of materials that we engineer for manufacturing, like metals and plastics¹. Although these things seem far removed from one another, occupying very different scales, they have a basic similarity: physical materials are made of atoms and molecules networked through chemical bonding into structures that give them particular properties. Similarly, information infrastructures are made up of their own components wired into networks, giving them particular properties. Understanding what those properties are, and why they come about, requires a new way of thinking. In this book, I will attempt to explain why this analogy is a fair one, why it is not the full story, but what we can learn from the limited similarity.

At the engineering level, we put aside the details of why a structure behaves as it does, and rather make use of what properties the materials promise. We become more concerned with how to use them, as ‘off-the-shelf’ commodities, and describe their promises in terms of new terms like strength, elasticity, plasticity, and so on, qualities far removed from raw atoms and chemical bonds. The useful properties of materials and infrastructure lie as much in the connections between parts, as in the parts that get connected, but we prefer to think of them as continuous reliable stuff, not assemblages of tiny pieces.

The parallels between information infrastructure and the physics of matter go deeper than these superficial likenesses, and I have studied some of these parallels in my research over the past twenty years. This book will describe a few of them. Understanding this new physics of information technology is going to be vital if we are to progress beyond the current state of the art. This is not the physics of silicon wafers, or of electronics, nor is it about ideas of software engineering; rather, it’s about a whole new level of description, analogous to the large scale thermodynamics that enabled the steam age, or the economic models of our financial age. Only then, will we be able to build predictably and robustly, while adapting to the accelerating challenges of the modern world.

What does this mean to you and me? Perhaps more than we might think. Every search we perform on the Internet, every music download piped to our earplugs from an embedded mobile hotspot, or more speculatively every smartphone-interfacing self-driving hybrid car purchased with a smart-chip enabled credit card, sends us careering ever onwards in the uncontrolled descent of miniaturization and high density information trawling that we cheer on as The Information Revolution.

We no longer experience information-based technology as thrilling or unusual; the developed world has accepted this trajectory, expects it, and even demands it. In the developing world, it is transforming money, communications and trade in areas where more traditional alternatives failed for a lack of physical infrastructure. The ability to harness information, to process it, and at ever greater speeds, allows the whole world to fend off threats, and exploit opportunities.

As information technology invades more and more parts of our environment, we are going to experience it in unexpected ways. We won’t always see the computer in a box, or the remote control panel, but it will be information technology nevertheless. New smart materials and biologically inspired buildings already hint at what the future might look like. Yet, if we are to trust it in all of its varied forms, we have to understand it better.

For years, we’ve viewed information technology as a kind of icing on our cake, something wonderful that we added to the mundane fixtures of our lives, with its entertainment systems and personal communications channels; but, then these things were no longer the icing, they were the cake itself. Information systems began to invade every aspect of our environments, from the cars we drive to the locks on the front door, from the way we read books to the way we cook food. We depend on it for our very survival.

Artificial environments support more of our vital functions for each day that passes. Few in the developed world can even remember how to survive without the elaborate infrastructure of electricity, supply networks, sanitation plants, microwave ovens, cars and other utilities. So many of us rely on these for their day to day lives. Some have hearts run by pacemakers, others are reliant on technology for heating or cooling. Our very sense of value and trade is computed by technology, which some parts of our planet would consider magic. We ‘outsource’ increasing amounts of our survival to a ‘smart’ environmental infrastructure, and hence we become more and more dependent on it for each day that passes.

What makes us think we can rely on all this technology? What keeps it together today, and how might it work tomorrow? Will we even know how to build the next generation of it, or will be become lulled into a stupor of dependence brought about by its conveniences? To shape the future of technology, we need to understand how it works, else what we don’t understand will end up shaping us.

As surely as we have followed the trajectory, we have come to rely on it, and thus we must know what it takes to make the next steps, and why. Some of those details are open to choice, others are constrained by the physical nature of the world we live in, so we’ll need to understand much more than just a world of computers to know the technology of tomorrow.

Behold the Great Transformation, not just of technology but of society itself, adapting to its new symbiosis! It is happening now, at a datacentre near you! (Or in a test tube, or under an etching laser.) Vast halls of computing power, and laboratories of microscopic bio-chemical machinery, have supplanted the mechanisation and civil engineering of the industrial age, as the darlings of change. What will be the next thing? Nanotechnology? Human enhancement? Where will our sense of adventure set out next?

In 1997, I visited San Diego, California, for the 11th Annual Conference in Large Installation System Administration, as something of an outsider. It was a conference not for the designers of information systems, but for those who keep such systems running, on behalf of society at large. As a physicist by training, relatively new to technological research, I was excited to present some work I’d been doing on new ways to ensure the reliability of large computer systems. I presented a rather technical paper on a new kind of smart process-locking mechanism to make computers more predictable and maintainable.

To a physicist, reliability looks like a kind of stability. It is about enabling an equilibrium to come about. To me, the work I presented was just a small detail in a larger and more exciting discussion to make computer systems self-governing, as if they were as ordinary a part of our infrastructure as as the self-regulating ventilation systems. The trouble was, no one was having this discussion. I didn’t get the response I was hoping for. A background in science had not prepared me for an audience with different expectations. In the world of computers, people still believed that you simply tell computers what to do, and, because they are just machines, they must obey.

I left the conference feeling somewhat misunderstood. My paper had been related to a piece of software called CFEngine that I had started developing in 1993 to configure and maintain computers without human intervention. It had become unexpectedly popular, spreading to millions of computers in datacentres and small environments, and it is still widely used today.

On the plane going home, my misery was compounded by becoming ill, and began to think about the human immune system and how smart it seemed to be in repairing a state of health. There was an answer! I became inspired to explain my work in terms of the analogy of health, and I spent much of the year thinking about how to write a paper ‘Computer Immunology’ which I submitted to the next conference, spelling out a manifesto for building self-healing computers².

The following year, 1998, the conference was in Boston, Massachusetts. This time, I was delighted to win the prize for best paper at the conference, and was immediately thrust into a world keen to know about the concepts of self-regulating, self healing computers—though it would take another ten years for the ideas to become widely recognized. The experience underlined the importance of bridging the awareness gap between cultures in different fields, even in science and technology. It underlined, perhaps, a need for books like this one.

After the conference, I was taken on a trip of honour by a former colleague Demosthenes Skipitaris from Oslo University College, to a high security, state-of-the-art datacentre facility run by a Norwegian search engine called FAST, just outside of Boston. After surrendering my passport on the way for security validation, and being frisked by over-dressed guards, we were led into a vast hall of computer racks the size of several football pitches.

Computers on top of computers, accompanied by the deafening noise of thousands of fans whirring, disks spinning and air blowing, all mounted in racks and stacked up to the ceiling. Row upon row of black boxes, separated by narrow walk-spaces, just wide enough for a person, for as far as the eye could see. We were listening to the roar of all the web searches of people from across the world being processed before our eyes, but there were no humans in sight. In the whole building I saw a single keyboard and a single screen for emergency use³.

“All this is run by your software CFEngine,” my host told me. CFEngine is essentially a collection of software robots embedded into each machine’s operating system. I told him about my computer health analogy, and he commented that with the software running, the most common failure in the machines was that the vibration from all the computer disks would cause the removable disks to work their way out of the chassis, causing a machine to stop now and then. That was the only time a human needed to touch the machines—just push the disk back in and restart.

Then, as we passed one of the racks, he pointed to a small cable emerging from a socket. “That,” he said, “is our single point of failure. If I pull that plug, we’re offline.” We stopped there for a moments to pay our respects to the gods of fragility.

It was a telling reminder that, even with the most advanced systems at our fingertips, the smallest detail can so easily be overlooked and result in a fatal flaw.

How would we even know about a tiny error, a minute design flaw, an instability waiting to grow into a catastrophic failure? Standing amongst the anonymous array of whirring machines in that hallowed hall, it was evident that finding a needle in a haystack might be easy by comparison. In a world of software, there is nothing to even get hold of and feel a prick of a needle.

In 2012, I visited a datacentre ten times the size of the one in Boston, and was shown a room where 40 humans still sat and watched screens, in what looked like an enactment of NASA’s mission control. They were hoping to see some advance warning of signs of trouble in their global operations, still using the methods of a decade before in a pretence of holding together a system that was already far beyond their ability to comprehend—as if a few shepherds were watching over all of the wildlife in Earth’s oceans with their crooks. Those 40 humans watched with naked eye graphs of performance, somewhat like medical monitors for tens of thousands of computers spread around the globe.

I recall thinking how primitive it all was. If a single machine amongst those tens of thousands were to develop an instability, it could bring everything to a halt, in the worst case, like pulling out an essential plug. Watching those screens was like trying to locate a single malignant cell in a patient’s body just by measuring a pulse. The mismatch of information was staggering. I began to wonder what role CFEngine’s immune principles already played in preventing that from happening. I hope that this book can help to shed some light on what makes a system well-behaved or not.

For two decades, the world’s most advanced datacentres have been run largely by automated robotic software that maintains their operational state with very little human intervention. Factories for manufacturing are manned by robot machines, and familiar services like banking, travel, and even vending machines have been automated and made more widely available than ever before. The continuity of these services has allowed us to trust them and rely on them.

How is this even possible? How is it that we can make machines that work without the need to coax and cajole them, without the need to tell them every detail of what to do? How can we trust these machines? And can we keep it up? So far, we’ve been lucky, but the long term answers have yet to be revealed. They can only emerge by knowing the science behind them.

If information systems are going to be mission critical in the society of today and tomorrow, then the mission controls of this increasingly ‘smart’ infrastructure need principles more akin to our autonomous immune systems than to nurses with heart monitors. We have to understand how continuous operation, how dependability itself, can emerge from the action of lots of individual cellular parts, and follow a path of continuous adaptation and improvement. To even grasp a knowledge of such speed, scale and complexity is beyond any human without technological assistance.

We build software systems every day, and extend the depth of this dependency on technology. We suffer sometimes from the hubris of believing that control is a matter of applying sufficient force, or a sufficiently detailed set of instructions. Or we simply hope for the best. How shall we understand the monster we are creating, when it is growing so rapidly that it can no longer be tethered by simple means, and it can no longer be outsmarted by any human?

Such a Frankensteinian vision is not as melodramatic as it sounds. The cracks in our invulnerability are already showing as tragedies emerge out of insufficiently understood systems. We have thrown ourselves into deep water with only a crude understanding of the survival equipment. Clearly, the adventure could go badly, in a worst case scenario. Luckily, this need not happen if we build using the best principles of science, and base technology on proper knowledge about how the world really works, adapting to its limitations.

This book is about that science, and how we may use it to build reliable infrastructure. It is about the tension between stability, witting and unwitting, and pride in a sense of control and certainty. In a sense, it is the story of how I conceived CFEngine, or if you prefer, of how to implement Computer Immunology.

The book is in three parts:

Part I: describes the fundamentals of predictability, and why we have to give up the idea of control in its classical meaning.
Part II: describes the science of what we can know, when we don’t control everything, and how we make the best of life with only imperfect information.
Part III: explains how the concepts of stability and certainty may be combined to approach information infrastructure as a new kind of virtual material, restoring a continuity to human-computer systems so that society can rely on them.

I have chosen to focus especially on the impact of computers and information on our modern infrastructure, yet the principles we need for managing technology did not emerge from computer science alone. They derive from an understanding of what is all around us, in the natural world. The answers have emerged from all those sciences that learned to deal with fundamental uncertainty about the world: physics, chemistry, economics and biology. Emergent effects, so often mysticized, are really not so mysterious once one takes the time to understand. They are inevitable consequences of information-rich systems. We must understand how to harness them for safe and creative purpose.

When civil infrastructure meant gas lamps and steam locomotives, and computers were still a ghost in the punch-card looms of the industrial revolution, the denizens of history wrestled with fundamental questions about the nature of the physical world that we still find hard to comprehend today. Without those questions and the discoveries they led to, none of what we take for granted today would have been possible.

So, to unearth the roots of this story about technological infrastructure, I want to delve into the roots of science itself, into the principles that allow us to understand system operation and design, to reveal the advances in thinking that led to modern, information rich methods of fending off uncertainty. Above all, this is a fascinating story of human endeavour, from a personal perspective. Just beyond reach of most of us, there is a treasure trove of understanding that has propelled humanity to the very limits of imagination. That is surely a story worth telling.

How to read this book

This book introduces concepts that will be new to a majority of readers. To do so, it builds from fundamental ideas that might initially seem to stray from the topic of information, and builds towards the more contemporary aspects of information infrastructure today. Some readers might be impatient to get to the final answers without all the basics, but science does not work that way. I have committed myself to explaining, as plausibly as I can, how this edifice of thought emerged, with cultural and historical context, in the hope that it will make sense to the committed reader. The panorama and intricacies of scientific thought are truly rewarding for those who are willing to climb the mountain.

I have provided a level of technical depth for readers who are conversant with science and technology. However, no one should feel defeated by these details. In all cases, it should be straightforward to skip across details that seem too difficult, and rejoin the train of thought later. I encourage readers to exercise personal judgement and skip over sections that seem difficult, and I cross my fingers that the book might still be enjoyed without every nuance rendered with complete fidelity.

— Mark Burgess, Oslo, 2013

Get In Search of Certainty now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

In Search of Certainty by Mark Burgess

Introduction: The Hallowed Halls

How to read this book

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly