This book can be useful for a number of readers: a developer trying to extend MySQL in some way; a DBA or database application programmer interested in how exactly MySQL runs his queries; a computer science student learning about database kernel development; a developer looking for ideas while working on a product that requires extensive database functionality that he must implement himself; a closed-source database developer wondering how in the world MySQL runs its queries so fast; a random, curious computer geek who has used MySQL some and wonders what is inside; and, of course, anybody who wants to look smart by having a book on MySQL internals displayed on his shelf.
Although MySQL source is open in the sense of being publicly available, it is in essence closed to you if you do not understand it. It may be intimidating to look at several hundred thousand lines of code written by gifted programmers that elegantly and efficiently solves difficult problems one line at a time. To understand the code, you will need a measure of the inspiration and perspiration of those who created it. Hopefully, this book can provide enough guidance to remove those barriers and to open the source of MySQL for you.
I do not believe it is possible to understand and appreciate MySQL strictly through a conceptual discussion. On a high conceptual level MySQL is very simple. It does not implement many revolutionary ideas. It sticks to the basics. Why is it so popular then? Why do we know enough about it for O'Reilly to be willing to publish a book on its internals?
The reason, in my opinion, is that what makes a good database is not so much the concepts behind it, but how well they are implemented. It is important to be conceptually sound on a basic level, but a good portion of the genius is in implementing those concepts in a way that provides a reasonable combination of good performance and the ease of maintenance. In other words, the devil is in the details, and MySQL developers have done a great job of taking that devil by the horns and twisting his head off.
Thus, in order to appreciate the inner workings of MySQL, you need to get close to the places where that devil is being subdued. Somewhere in the dark depths of the optimizer or inside the B-tree, there is music to be heard as you study the code. It will take some work to hear that music, but once you do, you can feel its beauty. And to hear the music you must not be afraid to compile the code, add a few debugging messages to help you understand the flow, and perhaps even change a few things to appreciate what will make the server crash (and how) if you fail to handle something that turns out to be important after all.
The first chapter provides a brief introduction of how different components of MySQL work together. Immediately afterward you will find a chapter about downloading and building MySQL from the source. You will have a much more meaningful experience studying MySQL internals if you follow the steps in it to get set up with a working, compilable copy of the code that you can change and test at your pleasure.
When approaching a new code base, I find it very useful to look at class/structure definitions and API call prototypes. I have a confession to make: I first look at the code, then read the comments, and I never look at block diagrams unless somebody asks me to. Chapter 3 is for the developers whose heads are wired like mine; it talks about the core server classes, structures, and API.
In Chapter 4 I talk about the communication protocol between the client and the server. Afterward, I hope you will say: "I am thankful for the MySQL API, and I even have a clue of how to fix it up if I had to!"
Chapter 5 discusses server configuration variables. Configuration variables are the controls of the server. Every one of them tells you about some special server capability or perhaps a problem some DBA had to solve at some point. It would not be too much of an exaggeration to say that if you understand the variables, you understand the server. Toward the end you will find a tutorial on how to add your own configuration variables.
Every server has to deal with the issue of how to handle multiple clients concurrently. MySQL does it using threads. Understanding threads and how they are used in MySQL is critical to being effective in working with MySQL source. Thus, Chapter 6 discusses thread-based request handling.
One of the distinct features of the MySQL architecture is its ability to integrate third-party storage engines. Chapter 7 focuses on the storage engine interface and provides a functional example of a simple storage engine.
Although at the moment MySQL supports a number of page and row-level locking storage engines, the core architecture has a strong MyISAM heritage. Part of that heritage is the mechanism to acquire a table lock. The table lock awareness, even when it is in essence a token lock, is important for an aspiring MySQL developer. Thus, Chapter 8 focuses on the table lock manager.
Chapter 9 focuses on the parser and optimizer. This is the chapter I would recommend to a DBA trying to improve the performance of MySQL. The key to optimizing MySQL queries and tables is to learn to think like the optimizer. This chapter also provides an overview of the source code for the brave developers preparing to immerse themselves into the optimizer's dark depths.
Chapter 10 is a cursory overview of MySQL storage engines. It may be helpful to a developer trying to create or integrate her own. A curious reader looking for what is out there may also find it of interest.
By no means is this book a comprehensive guide to MySQL internals. The subject is so deep that I do not believe it is humanly possible to scratch the surface even if you had 10,000 pages and the time to create them. To make matters more complicated, MySQL developers are adding new code daily. Fortunately, most of the core code tends to remain intact, so the book has a shot at not becoming obsolete before it is published. Nevertheless, do not be surprised when you look at the current MySQL code and find that some things are not quite like what you see in the book. You are likely to see new classes and calls in the API. On occasion, you may find that an old API call has a new argument. But hopefully the book can always serve as a guide to teach you enough basics about the code to bring you to a level of proficiency that will enable you to accomplish your goals.