Posted on by & filed under Content - Highlights and Reviews, Information Technology, Web Development.

One of the most important components of a web-based application is its database backend. The open source MySQL server is a very popular database choice. In this post, we will discuss the techniques and challenges involved in successfully implementing a scaling strategy for MySQL databases.

MySQL is an SQL-based relational database. Data is stored in the form of tables that are joined together to fetch results for complex queries. This is how the relational database model works. And it is exactly this model that inherently limits the extent to which relational databases can be scaled. For each complex query, data has to be fetched from multiple tables, combined, and the results sent back. The problem here is the requirement to fetch data from multiple tables and this is the one major feature that distinguishes MySQL databases from NoSQL databases (where no such limitation is present).

There are three main methods to scale a MySQL database setup: Replication, Sharding and Clustering. Replication involves creating multiple database nodes on which data is synchronously updated and as a result multiple nodes can handle the database load. Sharding involves partitioning the database across tables (and maybe even rows) so that data is spread across multiple database nodes. MySQL Cluster is an in-memory offering that is able to carry out automatic Sharding and Replication to provide benefits of both of these techniques in a combined package. We discuss each of these options below.

In replication, data from one server (master) is replication over multiple database servers (slaves). MySQL comes with support for replication built in. This is a good strategy to follow if the read to write ratio is other than 1:1. For most applications, read operations are far greater in number than write operations and as a result this scaling strategy is quite suitable. If the number of write operations are equal to or greater than the read operations (on average) then inconsistency between the database nodes (synchronization involves some delay or latency) might result in some unexpected results.

Sharding is the process of partitioning the schema of the database into multiple partitions and distributing each partition on a different server. Partitioning may be carried out at the table or row level. One drawback of sharding is that the application needs to be aware of the way data is split and the application is responsible for combining the data from the different partitions. Various frameworks and libraries are however available to care for this issue. Also splitting up the database schema is a manual process and has to be carefully planned and carried out. A less-than-optimal configuration might lead to performance degradation.

The last option is to use MySQL Cluster. This is an edition of MySQL that provides a distributed in-memory version of a MySQL database. This version of MySQL implements auto-sharding as well as replication management. While this combination offers the benefits of both Replication and Sharding combined, there are certain trade-offs involved. First, the automatic Sharding can degrade performance if complex queries are executed that require fetching data from more than one table. This is because multiple tables would be spread out across multiple nodes and as a result data from these nodes would need to be obtained and thus network latency would affect the query performance. Also being in-memory means that all data is stored in memory rather than on disk. While this does increase the I/O throughput, this configuration can easily exhaust memory resources for very large databases and hence is not feasible in such cases.

The choice of the technique chosen (as detailed above) depends upon the developer needs. For applications with a higher read-write ratio, replication may be the most suitable option. Large enterprise applications with a well understood database schema and usage requirements might be better served with a Sharding configuration. Similarly, applications requiring a high rate of throughput as well as the ability to dynamically change schema (supported by MySQL Cluster) will be best served with the clustering configuration of MySQL.

Safari Books Online has the content you need

Take advantage of these MySQL resources in Safari Books Online:

Sometimes applications can go mad: tables contain wrong data, users get random replies, server stop working, and so on. Several easy methods allow users to often find the problems quickly. MySQL Troubleshooting, based on successful conference presentations by the author, cover SQL problems, memory and other server problems, replication, and problems related to particular storage engines.
MySQL Clustering offers thorough, authoritative instruction on setting up and administering a MySQL Cluster from the developers of the cluster itself. You will learn about everything from installation and configuration to performance and troubleshooting in this authoritative reference guide to MySQL clustering.
Effective MySQL: Optimizing SQL Statements is filled with detailed explanations and practical examples that can be applied immediately to improve database and application performances. Featuring a step-by-step approach to SQL optimization, this Oracle Press book helps you to analyze and tune problematic SQL statements.
Server bottlenecks and failures are a fact of life in any database deployment, but they don’t have to bring everything to a halt. MySQL provides several features that can protect you from outages, whether you’re running directly on the hardware, on virtual machines, or in the cloud. MySQL High Availability shows you how to use these features effectively, and helps you determine which combination of features will give you the most reliable system for a price you can afford.

About the authors

Salman Ul Haq is a techpreneur, co-founder and CEO of TunaCode, Inc., a startup that delivers GPU-accelerated computing solutions to time-critical application domains. He holds a degree is Computer Systems Engineering. His current focus is on delivering the right solution for cloud security. He can be reached at salman@tunacode.com.
Shaneeb Kamran is a Computer Engineer from one of the leading universities of Pakistan. His programming journey started at the age of 12 and ever since he has dabbled himself in every new and shiny software technology he could get his hands on. He is currently involved in a startup that is working on cloud computing products.

Tags: Clustering, mySQL, NoSQL, Replication, Scaling, Sharding,

Comments are closed.