Corralling the Stampede

The next step was to figure out how to avoid the cache stampede. There were two problems with how we accessed our data. The first problem was that the queries we were running were very complex. Our database is highly relational and normalized; we try to never duplicate data and always create good relationships between tables, resulting in a system that is very flexible and can gather data in any way we need. However, this is not good for performance.

The decision was made to denormalize (some prefer the word optimize) our data for use by our public facing application servers. In all, some 100+ tables were reduced to about 30 in this scheme. We did not lose our highly relational data. This would be a full duplicate of the relational data in a format that was easier to digest by our application servers. In cases where we had one-to-many relationships, we would gather the data from the many side of the relationship and shove the data into a TEXT field in a serialized format. This was one less join we had to use in our query. In cases where it made sense to keep some data in a relational format we would do so.

But simply having faster queries would not solve all of our data problems. The second problem was that we were relying on one database server (with a hot spare) for all of our data. The stampede therefore created a bottleneck of connections on the single server. To remedy this, we thought of a couple of possible solutions.

The first thing we tried was MySQL ...

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.