How Apache Works

Apache is a program that runs under a suitable multitasking operating system. In the examples in this book, the operating systems are Unix and Windows 95/98/2000/Me/NT/..., which we call Win32. There are many others: flavors of Unix, IBM’s OS/2, and Novell Netware. Mac OS X has a FreeBSD foundation and ships with Apache.

The Apache binary is called httpd under Unix and apache.exe under Win32 and normally runs in the background.[5] Each copy of httpd/apache that is started has its attention directed at a web site , which is, for our purposes, a directory. Regardless of operating system, a site directory typically contains four subdirectories:

conf

Contains the configuration file(s), of which httpd.conf is the most important. It is referred to throughout this book as the Config file. It specifies the URLs that will be served.

htdocs

Contains the HTML files to be served up to the site’s clients. This directory and those below it, the web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for anything other than public data.

logs

Contains the log data, both of accesses and errors.

cgi-bin

Contains the CGI scripts. These are programs or shell scripts written by or for the webmaster that can be executed by Apache on behalf of its clients. It is most important, for security reasons, that this directory not be in the web space — that is, in .../htdocs or below.

In its idling state, Apache does nothing but listen to the IP addresses specified in its Config file. When a request appears, Apache receives it and analyzes the headers. It then applies the rules it finds in the Config file and takes the appropriate action.

The webmaster’s main control over Apache is through the Config file. The webmaster has some 200 directives at her disposal, and most of this book is an account of what these directives do and how to use them to reasonable advantage. The webmaster also has a dozen flags she can use when Apache starts up.

Tip

We’ve quoted most of the formal definitions of the directives directly from the Apache site manual pages because rewriting seemed unlikely to improve them, but very likely to introduce errors. In a few cases, where they had evidently been written by someone who was not a native English speaker, we rearranged the syntax a little. As they stand, they save the reader having to break off and go to the Apache site



[5] This double name is rather annoying, but it seems that life has progressed too far for anything to be done about it. We will, rather clumsily, refer to httpd/apache and hope that the reader can pick the right one.

Get Apache: The Definitive Guide, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.