Objective 4: General Troubleshooting

Troubleshooting is one of the most difficult but satisfying administration tasks. Few things feel better for a professional administrator than being posed with a problem and finding a solution for it. Successful troubleshooting requires an in-depth knowledge of the system being shot at and a few rules of conduct. In this section we'll go over a few tips on how to troubleshoot and suggest places for you to look for information.

The first rule of troubleshooting is: don't jump to conclusions! Just because someone says "foo stopped working," don't start adjusting the parameters for whatever "foo" is without gathering more information. Initial problem descriptions (especially from nontechnical users) are notoriously misleading.

This leads to rule number two: get a complete and accurate description of the problem. Foo may have very well stopped working, but it could be a side effect of bar being misconfigured.

Rule number three: reproduce the problem. It's very difficult to shoot at a problem you can't see. The hardest problems are intermittent, but luckily, most aren't. Most intermittent problems show a pattern over time.

So, you've followed the Three Rules of Troubleshooting. Where do you look to gather more information?

If you suspect hardware problems, dmesg and its associated log file /var/log/dmesg are a good place to start. dmesg shows you the kernel ring buffer, the buffer that the kernel writes messages to. (It's called a ...

Get LPI Linux Certification in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.