Putting Out the Fire

So, the cache stampede caused a rush on our database. Our first priority was to get the cache back in place. To do that, we had to start killing MySQL threads. A side effect of that, however, was that the requests waiting on the database server were now writing an empty cache entry to memcached. This made our front page blank. But, for the moment, that was fine. To get the cache filled again, we manually ran the function that created the cache for our front page. This refreshed the cache, and we had content again. Of course, we knew this was only a temporary solution—a five-minute solution, in fact. The cache would live for only five minutes before it expired.

Even with the cache in place, we started to see other problems. For one, the Apache servers had all reached their MaxClients. We had kept MaxClients set to a conservative number to avoid becoming CPU bound without knowing it. We had also never really been able to push these servers to their limits to know exactly what they could do. So, we increased MaxClients and restarted the servers.

Our success was short-lived, however: for each PHP request that came in and started an Apache client, a connection to our memcached pool was being created. This caused our memcached daemons to reach their connection limits quickly. So, we had to go through and update their configurations as well. We knew that memcached could handle lots of connections, so we proactively set its connection limits much higher than needed at ...

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.