A.7. Reliability

Hardware systems, unlike software, wear out and fail over time. When software works under a particular condition it tends to work under that same condition indefinitely, but hardware can work under a particular condition one day and then fail the next. These failures can often be repaired and the unit put back into service, although the very act of repair tends to result in a lessening of future reliability. The rate of failure in a population of systems is often measured in terms of the mean time between failures (MTBF). If the more important consideration is availability rather than paucity of failures, then this can be measured in terms of uptime, which is a function of the mean time between failure and the mean time to repair (MTTR). A test for reliability is often referred to as a "reliability demonstration" or "MTBF demonstration."

One common technique for such a test is to gather a fairly large number of units, power them up, and then leave them in a running state, with some load test cycling indefinitely, under periods of maximum operating temperature and humidity. Given a desired level of statistical MTBF in the production units, some small number of units can fail during the demonstration period, which generally lasts weeks. The higher the level of reliability to be demonstrated, the larger the sample, the longer the demonstration, and the fewer the units that can fail. These tests are based on an assumption that failure rates are constant over time, ...

Get Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Testing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.