Methodology for Performance Consulting

Here is a methodology I use when beginning to look at a web site’s performance problems. It reflects the point of view of an outside consultant brought in for a fresh look at the problems.

First, start a log and write down the date, who the customer is, and what the approximate problem is. Whatever you do or talk about, write it down. Not only will this help you when you need to remember passwords and what exactly you changed, but it is also useful for writing reports later.

Next, resist the urge to just start twiddling parameters. The webmaster you’re working with may tell you what he or she thinks is wrong and ask you to change something right off. Just say no. That’s a rathole you don’t want to enter. You need to do some analysis first. Likewise, if the webmaster asks for your first impression, don’t respond. If you do, he or she is quite likely to go into that rathole alone and drag you in behind. Point out that there are no magic bullets (or, at least, very few of them), and that slow and steady wins the race.

Write down, in as much detail as you can, what the perceived problem is and what the webmaster or users think caused the problem. Listen hard. Remember that performance problems always come down to unhappy people somewhere, and that success means making them happy rather than resolving all technical issues.

Ask what performance data has been collected so far. Has the customer run any benchmarks, or tried any solutions? Benchmark and duplicate the problem yourself. Get a good topology diagram with servers and connections clearly marked.

Consider changing the highest levels first, that is, the architecture of what the customer is doing, and identify steps that could possibly be eliminated. Low-level tuning should be saved for much later, because the gains are smaller, and any work you put into low-level tuning may be wiped out by architecture changes.

The most likely suspects for performance problems are home-grown CGI or server API applications, high-level architecture, the Internet, and hard disks. Try running a benchmark when no other users or processes are on the system, perhaps late at night, to find out what the best possible performance of the current configuration is. This helps clarify the difference between bad applications and excessive load on the system. If performance is bad for a single user with no network load, then the problem probably lies in the application. If performance is only intermittently bad, look for low-efficiency error handling by the application. Profile the application and really look at the results, because intuition about bottlenecks is often wrong. Look for memory leaks by monitoring process size.

Analyze the OS and hardware with the diagnostic tools at your disposal, again, from the highest levels first. Look at the server log files for errors. Run whatever sort of performance tools your server system has and look at CPU, disk, and memory usage. Look at router and switch configuration and throughput. Check physical cable connections and look for kinks or sources of interference. Get performance statistics from your ISP.

Once you have a hypothesis about where the problem is, back up everything, try a solution, and run your benchmarks again. Run whatever tests are needed to make sure you didn’t break anything else. If your solution didn’t help, you need to do some more analysis.

Get Web Performance Tuning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.