Now let’s look at things from the server side. Here’s what you should look at if your web server seems sluggish.
If you are running a web site from a PC, be sure to disable the power conservation features that spin down the disk and go into sleep mode after a period of inactivity. Sleep mode will slow down the first user who hits your site while it is sleeping, because it takes a few moments for the disk to spin up again. Some operating systems — for example, Mac OS X — are capable of quickly serving pages in their sleep; but even they will eventually have to wake up to log to disk, so it is best to turn off sleep mode.
DNS servers can become overloaded like anything else on the Internet. Since DNS lookups block the calling process, a slow DNS server can have a big impact on perceived performance. Check whether your DNS server’s CPU or network load is nearing its capacity by monitoring that machine’s hardware statistics. See Chapter 4 for more information on monitoring.
If you determine that your DNS server
is a problem, consider setting up additional servers or simply
pointing your DNS resolver to another DNS server. Using a different
DNS server is done by modifying
under Linux or using the Network Control Panel on Windows.
Netscape browsers do not display a page at all until all images sizes are known. If you do not include the images sizes in your HTML, this means that the browser must actually download all the images before it knows the sizes, resulting in a long delay before the user sees anything at all. Many users also do not download images for one reason or another, but would like to know what kind of image it is they are missing, especially if you use images for navigation tools. So for best performance and usability, make sure all your images have size parameters in the HTML like this:
<img src="images/foo.gif" alt="Picture of a Foo" width=190 height=24>
Similarly, many users turn off Java because VM startup time and applet download time are very annoying. Like the ALT text for images, any text within the <APPLET></APPLET> tags will be displayed when Java is off, so the user will have an idea of whether he wants to turn Java back on and reload the page. This text can include any valid HTML, so it is possible for the content designer to create a useful alternative to the applet and put it within the applet tag.
<META HTTP-EQUIV = "Refresh" Content = "2;URL=http://www.go here.com">
Avoid redirects if at all possible because they waste time. But if you have to use one, at least make it fast by putting in a zero-second delay.
Web servers are often set by default to take the IP address of the client and do a reverse DNS lookup on it (finding the name associated with the IP address) in order to pass the name to the logging facility or to fill in the REMOTE_HOST CGI environment variable. This is time consuming and not necessary, since a log parsing program can do all the lookups when parsing your log file later.
You might be tempted to turn off logging altogether, but that would not be wise. You really need logs to show how much bandwidth you’re using, whether it’s increasing, and lots of other valuable performance information. You just don’t need to log DNS names. CGIs can also do the reverse lookup themselves if they need it. Every web server has the option to turn off reverse DNS lookups in its configuration files. Refer to your web server’s documentation.
TCP will begin a connection with the assumption that a segment has been lost if it has not been acknowledged within a certain amount of time, typically 200 milliseconds. For some slow Internet connections, this is not long enough. TCP segments may be arriving safely at the browser, only to be counted as lost by the server, which then retransmits them, using up bandwidth. Turning up the TCP retransmit timeout will fix this problem, but it will also reduce performance for fast but lossy connections, where the reliability is poor even if the speed is good. For long-lived TCP connections, TCP will dynamically adapt to the performance of that connection, but most connections to web servers are short, so the initial timeout setting has a big impact.
Internet Protocol data packets must go through a number of forks in the road on the way from the server to the client. Dedicated computers called routers make the decision about which fork to take for every packet. That decision, called a router “hop,” takes some small but measurable amount of time, typically a millisecond or two. Servers should be located as few router hops away from the audience as possible.
ISPs usually have their own high-speed network connecting all of their dial-in points of presence (POPs). A web surfer on a particular ISP will probably see better network performance from web servers on that same ISP than from web servers located elsewhere, partly because there are fewer routers between the surfer and the server. National ISPs are near a lot of people. If you know most of your users are on AOL, for example, get one of your servers located inside AOL. The worst situation is to try to serve a population far away, forcing packets to travel long distances and through many routers. A single HTTP transfer from New York to Sydney can be painfully slow to start and simply creep along once it does start, or just stall. The same is true for transfers that cross small distances but too many routers. Another solution is to host your data on one of the many content distribution services, such as Akamai.
The most effective blunt instrument for servers and users alike is a better network connection, with the caveat that it’s rather dangerous to spend money on it without doing any analysis. For example, a better network connection won’t help an overloaded server in need of a faster disk or more RAM. In fact, it may crash the server because of the additional load from the network.
While server hardware is rarely the bottleneck for serving static HTML, a powerful server is a big help if you are generating a lot of dynamic content or making a lot of database queries. If the CPU usage is at 100 percent, you have found a problem that needs immediate attention.
Whether you will benefit from a CPU upgrade depends entirely on the problem, and the vendor is not likely to tell you don’t really need more hardware. You may just have a poorly written application. If you’ve profiled your application and really need the extra power, it helps to upgrade from PC hardware to Unix boxes from Sun, IBM, or HP. They have much better I/O subsystems and scalability. Monitor your server’s hardware utilization to be aware of hardware bottlenecks.
On Solaris up to Version 7, run
vmstat and look at the
column, which is the scan rate for free memory. If the
sr column is consistently above zero, you have a
memory shortage. Other indications that you are short of memory are
any swapping (swapping activity should be zero at all times) or
consistent paging. On Solaris 8 and later, look at free memory.
RAM accesses data thousands of times faster than any disk. So getting more data from RAM rather than from disk can have a huge positive impact on performance. All free memory will automatically be used as filesystem cache in most versions of Unix and in NT, so your machine will perform repetitive file serving faster if you have more RAM. Web servers themselves can make use of available memory for caches. More RAM also gives you more room for network buffers and more room for concurrent CGIs to execute.
You may have plenty of memory, yet find it gets used up over time because a process is leaking (losing references to allocated memory). Simply by looking at the size of individual processes over time with top, you should be able to get a feel for which ones are leaking memory. They will have to either be fixed or restarted on a regular basis.
On Solaris, look at the output from iostat -x. Disk access latencies consistently higher than 100 milliseconds are a cause for concern. When buying disks, get those with the lowest seek time, because disks spend most of their time seeking (moving the arm to the correct track) in the kind of random access typical of web serving.
A collection of small disks is often better than a single large disk. 10,000 rpm is better than 7,200 rpm. Bigger disk controller caches are better. SCSI is better than IDE or EIDE. But all of these things cost more money as well.
Use multiple mirrored servers of the same capacity and balance the load between them. There are now many commercial services, such as Akamai, that provide caching servers. Your load will naturally be balanced to some degree if you are running a web site with an audience scattered across time zones or around the world, such as a web site for a multinational corporation.
Software generally gets faster and better with each revision. At least that’s how things are supposed to work. Try the latest version of the operating system and web server and apply all of the non-beta patches, especially the networking and performance-related patches. This rule can sometimes be profitably broken, since old software often takes less memory.
If a performance problem happens only at certain intervals, check what cron or Autosys jobs the server is running. (Autosys is a commercial version of cron from Computer Associates.) These intermittent problems can be infuriating if you notice the slowdown and look for the culprit just as it finishes and goes away. You might just leave perfmeter running if you’re on Solaris to look for regular CPU spikes. This should illustrate repeating load patterns well. You can disable the cron daemon if necessary.
Don’t run anything unnecessary for web service on your web server, middlware, or database machine. In particular, your web server should not be an NFS server, an NNTP server, a mail server, or a DNS server. Find those things other homes. You should run top (or taskmanager on Windows, or prstat on Solaris 8) and figure out which of the processes are using the most CPU and memory. Kill all unnecessary daemons, such as lpd.
Don’t even run a windowing system on your web server. You don’t really need it, and it takes up a lot of RAM. Terminal mode is sufficient for you to administer your web server. On Windows, however, you don’t have any choice; Windows always wastes memory and CPU on the GUI because there is no terminal mode.
Server Side Includes (SSI) are very inefficient. SSI means that the server parses your HTML and looks for commands to run programs and insert content. It is better to dynamically generate the whole page from one CGI or servlet than to run SSIs. CGI is not as bad as it used to be because operating systems have already improved the ability to run many short-lived processes because of demands from the Web.
You may think that you have to generate content on demand where that’s not really the case. You can update static HTML many times a day, giving the impression of dynamic content without incurring nearly the same overhead. It depends on the number of possible inputs from the user. If there are only a few, you can precalculate responses to them all.
If you use a middleware server that keeps a database connection pool, beware that growing that pool on demand is very bad for performance. You may be able to start the pool high enough that it will not need to increase. A typical symptom is that performance is fine at low loads, but intermittently slow as the load increases and the pool takes time to grow.
If you are allocating database connections from a pool but not reclaiming them, you may be forcing unnecessary growth of the pool, or even bringing your site to a halt until unused connections time out and are collected. To find such leaks, you can watch the number of connections used under load. Chapter 4 has a script that can screen scrape the Weblogic Admin web page and graph usage. To fix the connection leak, you will have to closely examine your code for overt failures to release connections, and for possible exceptions that can divert code from releasing connections.
Most network hardware, such as hubs, switches, and routers, are SNMP-compliant, meaning they will give statistics on their load and collision rates to any SNMP-compliant tool. Watch these statistics for signs of overload. Overloaded hubs are especially likely to be offenders, and are easily replaced with better-performing switches. Also beware of Ethernet connections misconfigured such that one side is full duplex while the other side is not.
garbage collection (GC). Since GC is usually single-threaded, you may
see one CPU at 100 percent while the others are at 0 percent during
the stall. Use mpstat on Solaris to see each
CPU’s load. While increasing the initial and maximum
heap sizes helps delay the inevitable GC, they also make it take
longer for most VM’s. IBM’s
generational garbage collecting VM may be an exception. Also, set
-verbosegc when you start the Java VM to clearly see
when garbage collection is happening. The latest JDK releases from
Sun allow the programmer some control over GC.
Don’t. The overhead of serialization of object parameters is very large. Local method calls are many thousands of times faster than remote calls. If at all possible, choose as your client a standard browser displaying HTML for your GUI, not an applet making RMI calls.
Run strace on Linux or truss on Solaris to see what your server processes are doing. It will quickly become apparent if you are doing too much logging. You will see many small write OS calls, all to the same file descriptor. First, try to buffer the logging, so that it happens in larger increments. Buffered logging is usually an option on most servers. Second, try not to log from Java programs, to avoid the overhead of temporary object creation and conversion between Unicode and ASCII.
Revision control systems are wonderful for tracking changes to HTML and code, but terrible for performance. Copy your production data to your web servers rather than serving directly out of ClearCase or other revision control systems.
Expanding a database connection pool
Reverse DNS lookups
TCP retransmit timeouts
Overloaded hubs and switches
Java doing garbage collection
Waiting for the return of an RMI or CORBA or EJB call
Writing massive amounts of data to JDBC logs or other logs in tiny increments
Accessing production web content directly from revision control systems such as Clearcase
Too few Apache daemons or Netscape threads