Protecting Against System Crashes

There are a variety of approaches you can take to help protect your system against the ill effects of system crashes, including the following:

  • Providing component redundancy

  • Using Real Application Clusters (formerly named Oracle Parallel Server)

  • Using Transparent Application Failover software services

Component Redundancy

As basic protection, the various hardware components that make up the database server itself must be fault-tolerant. Fault-tolerance, as the name implies, allows the overall hardware system to continue to operate even if one of its components fails. This feature, in turn, implies redundant components and the ability to detect component failure and seamlessly integrate the failed component’s replacement. The major system components that should be fault-tolerant include the following:

  • Disk drives

  • Disk controllers

  • CPUs

  • Power supplies

  • Cooling fans

  • Network cards

  • System buses

Disk failure is the largest area of exposure for hardware failure, since disks have the shortest times between failure of any of the components in a computer system. Disks also present the greatest variety of redundant solutions, so discussing that type of failure in detail should provide the best example of how high availability can be implemented with hardware.

Disk redundancy

Disk failure is the most common cause of system failure. Although the mean time to failure of an individual disk drive is very high, the ever-increasing number of disks used for today’s very large ...

Get Oracle Essentials: Oracle9i, Oracle8i and Oracle8, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.