At the most basic level, web servers accept requests and return replies. The reply can be a static page, custom dynamic content, or an error. While there is a lot of variation in performance depending on load, an individual request for a static page typically takes only one- or two-tenths of a second from the time the request arrives on the network until the response is pushed back out. Modem latency, Internet latency, and even browser parsing time are all likely to be larger than that, so a lightly loaded web server will not be a bottleneck.
A heavily loaded web server is another story. Web servers tend to go nonlinear when loaded beyond a certain point, degrading rapidly in performance. This chapter is about why that happens and what your options are for getting the most out of your web server software.
The first generation of web servers were just another Unix service
launched on demand from
, which reads
/etc/services on startup and listens to the
ports specified. When a request comes in on one of
’s ports, it launches the program specified in
/etc/services to deal with requests on that
port. This requires calling the
fork() to clone
to get a process, and
exec() to write over that process with another process that can service the request. This mechanism is intended to conserve system resources by starting up daemons ...