Building for Recovery

The most common way to fix Web site faults or even Windows failures is to reboot the entire system, which takes anywhere from 10 seconds if the application alone is rebooted to a couple of minutes if the whole thing is restarted. The Stanford team is working on mechanisms to reduce the granularity of the reboot by micro-rebooting just the necessary subcomponents. With smaller components, the reboot takes much less time. So, instead of seeing an error message, a user would experience a 3-second delay followed by resumption of normal service.

To help analyze complex malfunctions in systems, the ROC team is building technology to determine which components are at fault. This technology is called PinPoint. Every time someone ...

Get End of Software, The: Finding Security, Flexibility, and Profit in the On Demand Future now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.