Bust the Cache for Accuracy

Measurement solutions based on web server logfiles suffer from a variety of factors that decrease their accuracy. Caching devices are the primary culprits but, in some cases, the cache can be beaten and accuracy improved.

Web server logfiles suffer from a handful of accuracy issues, perhaps the most significant arising from caching devices on the Internet. A caching device is any piece of hardware or software designed to store temporary copies of a file, most often to improve delivery performance. There are two types of caching devices that create problems for web server logfiles: clientside caches and server-side caches.

Client-side caches are deployed locally in corporate network operation centers and at Internet Service Providers to improve performance. The most extreme example of a client-side cache is the browser cache, software built into your Internet browser that is designed to save local copies of files. Server-side caches are often placed in front of your own web servers to reduce load. (See Web Caching [O’Reilly] for a complete treatise on the subject, or, if you prefer going online, Wikipedia has an excellent entry on the subject at http://en.wikipedia.org/wiki/Web_cache.)

The essentials of caching are as follows: because the document is served from a cache, the request never actually makes it into the web server log. Depending on how many of your pages are cached, the result can be a dramatic undercounting of page views, which then cascades ...

Get Web Site Measurement Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.