CHAPTER 6A Reliability Management Toolbox

Reliability (as defined in this book) is about avoiding system problems, making core processes more fault-tolerant, or, when problems do occur, having capabilities in place so symptoms are spotted early and corrective actions are taken before downstream affects are too severe. The system’s reliability is ultimately measured by the end user’s experience, based on his or her exposure to unexpected interruptions, failures, and general instability. In extreme cases, reliability problems can lead to availability issues, as failures take down whole components—although we’ll focus specifically on that later. So, in summary, some practical examples of reliability management tasks include the following:

Proactive ...

Get Managing Oracle Fusion Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.