Preface

When I first started using the Internet in 1986, my friends and I were obsessed with anonymous FTP servers. What a wonderful concept! We could download all sorts of interesting files, such as FAQs, source code, GIF images, and PC shareware. Of course, downloading could be slow, especially from the busy sites like the famous WSMR-SIMTEL20.ARMY.MIL archive.

In order to download files to my PC, I would first ftp them to my Unix account and then use Zmodem to transfer them to my PC through my 1200 bps modem. Usually, I deleted a file after downloading it, but there were certain files—like HOSTS.TXT and the “Anonymous FTP List”—that I kept on the Unix system. After a while, I had some scripts to automatically locate and retrieve a list of files for later download. Since our accounts had disk quotas, I had to carefully remove old, unused files and keep the useful ones. Also, I knew that if I had to delete a useful file, Mark, Mark, Ed, Jay, or Wim probably had a copy in their account.

Although I didn’t realize it at the time, I was caching the FTP files. My Unix account provided temporary storage for the files I was downloading. Frequently referenced files were kept as long as possible, subject to disk space limitations. Before retrieving a file from an FTP server, I often checked my friend’s “caches” to see if they already had what I was looking for.

Nowadays, the World Wide Web is where it’s at, and caching is here too. Caching makes the Web feel faster, especially for popular pages. Requests for cached information come back much faster than requests sent to the content provider. Furthermore, caching reduces network bandwidth, which translates directly into cost savings for many organizations.

In many ways, web caching is similar to the way it was in the Good Ol’ Days. The basic ideas are the same: retrieve and store files for the user. When the cache becomes full, some files must be deleted. Web caches can cooperate and talk to each other when looking for a particular file before retrieving it from the source.

Of course, web caching is significantly more sophisticated and complicated than my early Internet years. Caches are tightly integrated into the web architecture, often without the user’s knowledge. The Hypertext Transfer Protocol was designed with caching in mind. This gives users and content providers more control (perhaps too much) over the treatment of cached data.

In this book, you’ll learn how caches work, how clients and servers can take advantage of caching, what issues are important, how to design a caching service for your organization, and more.

Audience

The material in this book is relevant to the following groups of people:

Administrators

This book is primarily written for those of you who are, or will be, responsible for the day-to-day operation of one or more web caches. You might work for an ISP, a corporation, or an educational institution. Or perhaps you’d like to set up a web cache for your home computer.

Content providers

I sincerely hope that content providers take a look at this book, and especially Chapter 6, to see how making their content more “cache aware” can improve their users’ surfing experiences.

Web developers

Anyone developing an application that uses HTTP needs to understand how web caching works. Many users today are behind firewalls and caching proxies. A significant amount of HTTP traffic is automatically intercepted and sent to web caches. Failure to take caching issues into consideration may adversely affect the operation of your application.

Web users

Usually, the people who deploy caches want them to be transparent to the end user. Indeed, users are often unaware that they are using a web cache. Even so, if you are “only” a user, I hope that you find this book useful and interesting. It can help you understand why you sometimes see stale web pages and what you can do about it. If you are concerned about your privacy on the Internet, be sure to read Chapter 3. If you want to know how to configure your browser for caching, see Chapter 4.

To benefit from this book, you need to have only a user-level understanding of the Web. You should know that Netscape Navigator and Internet Explorer are web browsers, that Apache is a web server, and that http://www.oreilly.com is a URL. If you have some Unix system administration experience, you can use some of the examples in later chapters.

Get Web Caching now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.