Multiple Component Failure

Many networks are designed to avoid single points of failure. What usually brings these networks down is multiple component failure. Multiple component failure can be triggered by a root cause such as dirty or unreliable power, or it can just be a fluke. One device can also sometimes cause failures in other devices.

Sun servers used to be programmed to halt when a break character was sent to them. Many people used Cisco routers as terminal servers that allowed remote connections to the consoles of these servers. One of the issues that was discovered with this combination was that when the Cisco routers were powered off and on, break signals were sent out all of the serial lines connecting the Sun consoles. So, if the terminal server failed, all the Sun servers halted. The parallel discovery was that having all the servers offline was not good for business. Sun's newer servers do not halt on a break signal, but, if you're using older Sun systems, beware of using Cisco routers as remote console devices.

Tip

See Cisco field notice #15521, titled "Terminal Server Break Character on Cisco Access Servers," for more information on this problem.

Sometimes, multiple devices fail for reasons known only to them. In one installation, I saw a dual-supervisor 6509 have a compound failure where the primary supervisor failed, but the primary MFSC stayed active. Because the MSFC is tied physically to the supervisor to get connectivity to the networks in the switch, the entire ...

Get Network Warrior now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.