Alarms Abound

I was just sitting down to eat dinner when my phone started to beep and vibrate. Nagios was reporting that our main database server was not responding. I simultaneously logged on to my laptop and dialed the phone to see if anyone else knew what was going on. At first, we were not sure what was happening. The database server was loaded with queries. But the queries were identical, and I recognized it: it was the query that was used to generate the cache for the front-page content. But why were so many running? We would soon hear about the Yahoo! blog post from our chief editor. It turned out that the PR firm had told us about the article, but as far as we knew it was going to be published after Christmas, not before. Figure 9-1 shows that at 8:00 p.m. EST, page views increased to a level that was abnormal for this time of day.

Page views during Yahoo! front-page spike

Figure 9-1. Page views during Yahoo! front-page spike

At that time, our infrastructure used a reactive caching system. This is a very traditional system and is usually the first example you'll find when learning about caching techniques for scaling websites. It works like this:

  1. A request comes in for a page.

  2. If the data needed is in the cache (a cache hit), the cache is used.

  3. If the data is not found in the cache (a cache miss), code is run to generate the data needed.

  4. The generated data is then placed into the cache for future requests.

That is a great ...

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.