Foreword by Michael Pecht

Two subway trains collide in Washington, D.C., killing nine; an Airbus A330 airliner crashes into the Atlantic Ocean with no survivors; the FAA computer system goes down, paralyzing air traffic in a large region of the U.S. for half a day and for the third time in two years. While these failures dominated the front pages in 2009 and 2010, other major system failures have occurred in telecom network systems, computer systems, data servers, electrical power grids, energy generation systems, and healthcare systems. The costs of such incidents were enormous. In the worst cases, lives were lost and people were injured; in all cases, people were adversely affected. The economic repercussions were also staggering (e.g., in one case, the failure of a point-of-sale information verification system resulted in losses of $5,000,000 per minute in lost sales). They also present, as President Obama noted, “[some] of the most serious economic and national security challenges face[d] as a nation”.

Today's systems perform very important societal functions in such diverse areas as communications, transportation, energy networks, financial transactions, and healthcare. But these systems fail, and the consequences can be serious: transportation paralysis, airplane accidents, electrical power outages, and telecom system crashes, to name a few. Appropriate reliability methods are critical to ensure highly available and safe systems, however some methods are more beneficial ...

Get Reliability Technology: Principles and Practice of Failure Prevention in Electronic Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.