O'Reilly logo

Web Performance Tuning, 2nd Edition by Patrick Killelea

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Monitoring Per-Process Statistics

It is very valuable to know which processes are using the most CPU time or other resources on a remote machine. The rpc.rstatd daemon, which we used earlier, does not provide per-process information. Commercial software, such as Measureware, requires the installation of potentially buggy remote agents to report per-process information, and keeps its data in a proprietary format. It is often unacceptable to install unknown agents on important production machines, and it is never to your advantage to have your data locked into a proprietary format. Here we look at alternate freeware for collecting the same data—first, at a web server, and second, at telnet from a Perl script.

Using CGIs to Run Tools

It is not hard to create a CGI that calls ps, vmstat, sar, top, or other system monitoring programs that were intended to be used only by someone logged in to that machine. For example, using Apache, you could copy /bin/ps to /home/httpd/cgi-bin/nph-ps and suddenly you have a version of ps that is directly runnable from the Web.

You need to rename it with the nph- prefix to tell the Apache web server not to expect or add any headers. If you don’t, the web server will expect a Content-Type header from ps and give an error because ps does not output that header. The nph- header tells Apache not to worry about it. (“nph” stands for “nonparsed headers.”) What you then get will not be prettily formatted in a browser, but at least will be usable by a script that calls that URL. You could also write a CGI that in turn calls ps, then adds the appropriate header, and does any other processing you like, but this adds the overhead of yet another new process every time you call it.

I have used a very tiny web server called mathopd from http://www.mathopd.org/ to run ps as a CGI. There is essentially no documentation of this web server, which is distributed as source code, but it is simple enough. It compiles cleanly on Linux, but requires a few tweaks to compile on Solaris, where it also needs the -lnsl and -lsocket options to gcc in the Makefile. mathopd has the advantage that it is even smaller than Apache, runs as a single process, and does not allocate any new memory after startup. I am reasonably confident that mathopd is free of memory leaks, but it is simple to write a cron job that kills and restarts it each night if this is a concern. It is single-process and single-threaded. If you write its log files to /dev/null, it should not use any disk space. I keep a compiled copy on my web site at http://patrick.net/software/.

Using rstat to Trigger Deeper Analysis

Here is a script that watches CPU usage, and if idle time falls below a certain threshold, it hits the URL to run ps as a CGI and mail the result to an administrator, who can determine which of the processes is the offender. It can be run as a cron job, or you can make it into a loop and leave a single process going. (I prefer a cron job because long-running jobs tend to leak memory or get killed.)

#!/usr/local/bin/perl
use LWP::UserAgent;

$thresh  = 50;                # usr CPU threshold, above which we report top 10 procs
$machine = www.patrick.net;
$\       = "\n";              # automatically append newline to print statements

# grab the cpu usage
$_       = `/opt/bin/rstat $machine`;

($yyyy, $mon, $dd, $hh, $mm, $ss, $usr, $wio, $sys, $idl, $pgin, $pgout, $intr,
$ipkts, $opkts, $coll, $cs, $load) = split(/\s+/);

if ($idl < $thresh) {
   print "$machine is under $thresh\n";
   $ua = LWP::UserAgent->new;

   $request = new HTTP::Request('GET', "http://$machine/nph-ps");
   $response = $ua->request($request);

   if (!$response->is_success) {
       die $request->as_string(  ), " failed: ", $response->error_as_HTML;
   }

   open (MAIL, '|/bin/mail hostmaster@bigcompany.com');
   print MAIL 'From: hostmistress@bigcompany.com';
   print MAIL 'Reply-To: hostmistress@bigcompany.com';
   print MAIL "Subject: $machine CPU too high";
   print MAIL "";
   print MAIL "CPU idle time is less than $thresh\% on $machine";
   print MAIL "Here are the top 10 processes by CPU on $machine";
   print MAIL $response->content;
   print MAIL "\nThis message generated by cron job /opt/bin/watchcpu.pl";
   close MAIL;
}

Using Telnet from Perl

The problem with running a web server to report performance statistics is that it requires the installation of software on your production systems. Installation of extra software on production systems should be avoided for many reasons. Extra software may leak memory or otherwise use up resources, it may open security holes, and it may just be a pain to do. Fortunately, all you really need for a pretty good monitoring system is permission to use tools and interfaces that are probably already there, such as the rstat daemon, SQL, SNMP, and Telnet. They will not work through a firewall the way web server-based tools do, but your monitoring system should be on the secure side of your firewall in most cases anyway.

Telnet is the most flexible of these interfaces, because a Telnet session can do anything a user can do. Here is an example script that will log in to a machine given on the command line, run ps, and dump the results to standard output:

#!/usr/local/bin/perl

use Net::Telnet;

$host = $ARGV[0];
$user = "patrick";
$password = "passwd";

my $telnet = Net::Telnet->new($host);
$telnet->login($user, $password);
my @lines = $telnet->cmd('/usr/bin/ps -o pid,pmem,pcpu,nlwp,user,args');
print @lines;
$telnet->close;

This ability to run a command from Telnet and use the results in a Perl script is incredibly useful, but also easily abused, for example, by running many such scripts every minute from cron jobs. Every login will take a tiny bit of disk space (for instance, each will be recorded in /var/run/utmp on Linux.) Please be cautious about how much monitoring you are doing, lest you become part of your own performance problem.

To fill out a per-process monitoring system with telnet and ps, we would like to be able to store our data in a relational database. For that, we need to define a database table. Here’s the definition of a table for storing ps data in Oracle that I use to store data coming from the Solaris command /usr/bin/ps -o pid,pmem,pcpu,nlwp,user,args:

create table ps(
   machine    varchar2(20),
   timestamp  date not null,
   pid        number(6),
   pmem       number(3,1),
   pcpu       number(3,1),
   nlwp       number(5),
   usr        varchar2(8),
   args       varchar2(24)
);

And here’s a Perl script that can telnet to various machines, run ps, and store the resulting ps data in a database:

#!/usr/local/bin/perl

$ENV{ORACLE_HOME} = "/opt/ORACLE/product";

use DBI;
use Net::Telnet;

$user      = "patrick";
$pwd       = "telnetpasswd";

$machine = $ARGV[0];

my $telnet = Net::Telnet->new($machine);
$telnet->timeout(45);
$telnet->login($user,$pwd);
my @lines = $telnet->cmd('/usr/bin/ps -e -o pid,pmem,pcpu,nlwp,user,args');
$telnet->close;

$dbh = DBI->connect("dbi:Oracle:acsiweba", "patrick", "dbpasswd")
   or die "Can't connect to Oracle: $DBI::errstr\n";

foreach (@lines) {
   # Grab the lines with "dummy" in them, and parse out this data:
   #     PID      %MEM      %CPU   NLWP   USER         COMMAND
   #
   # and insert the fields into ps table, which looks like this:
   #
   # Name                                      Null?   Type
   # -------------------------------------------------------
   # MACHINE                                            VARCHAR2(20)
   # TIMESTAMP                                 NOT NULL DATE
   # PID                                                NUMBER(6)
   # PMEM                                               NUMBER(3,1)
   # PCPU                                               NUMBER(3,1)
   # NLWP                                               NUMBER(5)
   # USR                                                VARCHAR2(8)
   # ARGS                                               VARCHAR2(24)

   if (/(\d+) +(\d+\.\d) +(\d+\.\d) +(\d+) +(\w+) .*Didentifier=(\w+)/) {

       $sth = $dbh->prepare("insert into ps values ('$machine', sysdate, '$1', '$2', 
'$3', '$4', '$5', '$6')");
       $sth->execute(  );
   }
}

$dbh->disconnect or warn "Disconnect failed: $DBI::errstr\n";

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required