Chapter 9. Logfiles and Monitoring

Hacks 78–88: Introduction

The only thing worse than disastrous disk failures, runaway remote hosts, and insidious security incidents is the gut-wrenching feeling that comes with the realization that they probably could’ve been avoided.

To avert catastrophe, often the best tool you can have is access to data that enables you to take proactive steps. Whether it’s having a disk tell you when it’s about to expire or being informed of network or service outages, tools that aggregate data and alert you to anomalies are invaluable to system and network administrators. The goal of this chapter is to show you how to get data you don’t currently have, and how to use data you do have in more useful ways.

Avoid Catastrophic Disk Failure

Access your hard drive’s built-in diagnostics using Linux utilities to predict and prevent disaster.

Nobody wants to walk in after a power failure only to realize that, in addition to everything else, because of a dead hard drive they now have to rebuild entire servers and grab backed-up data from tape. Of course, the best way to avoid this situation is to be alerted when something is amiss with your SCSI or ATA hard drive, before it finally fails. Ideally the alert would come straight from the hard drive itself, but until we’re able to plug an RJ-45 directly into a hard drive we’ll have to settle for the next best thing, which is the drive’s built-in diagnostics. For several years now, ATA and SCSI drives have supported a standard ...

Get Linux Server Hacks, Volume Two now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.