Cover image for High Performance MySQL, 2nd Edition

Book description

High Performance MySQL is the definitive guide to building fast, reliable systems with MySQL. Written by noted experts with years of real-world experience building very large systems, this book covers every aspect of MySQL performance in detail, and focuses on robustness, security, and data integrity. High Performance MySQL teaches you advanced techniques in depth so you can bring out MySQL's full power. Learn how to design schemas, indexes, queries and advanced MySQL features for maximum performance, and get detailed guidance for tuning your MySQL server, operating system, and hardware to their fullest potential. You'll also learn practical, safe, high-performance ways to scale your applications with replication, load balancing, high availability, and failover. This second edition is completely revised and greatly expanded, with deeper coverage in all areas. Major additions include:

  • Emphasis throughout on both performance and reliability

  • Thorough coverage of storage engines, including in-depth tuning and optimizations for the InnoDB storage engine

  • Effects of new features in MySQL 5.0 and 5.1, including stored procedures, partitioned databases, triggers, and views

  • A detailed discussion on how to build very large, highly scalable systems with MySQL

  • New options for backups and replication

  • Optimization of advanced querying features, such as full-text searches

  • Four new appendices

The book also includes chapters on benchmarking, profiling, backups, security, and tools and techniques to help you measure, monitor, and manage your MySQL installations.

Table of Contents

  1. High Performance MySQL, 2nd Edition
  2. Foreword
  3. Preface
    1. How This Book Is Organized
      1. A Broad Overview
      2. Building a Solid Foundation
      3. Tuning Your Application
      4. Scaling Upward After Making Changes
      5. Making Your Application Reliable
      6. Miscellaneous Useful Topics
    2. Software Versions and Availability
    3. Conventions Used in This Book
    4. Using Code Examples
    5. Safari® Books Online
    6. How to Contact Us
    7. Acknowledgments for the Second Edition
      1. From Baron
      2. From Peter
      3. From Vadim
      4. From Arjen
    8. Acknowledgments for the First Edition
      1. From Jeremy
      2. From Derek
  4. 1. MySQL Architecture
    1. MySQL’s Logical Architecture
      1. Connection Management and Security
      2. Optimization and Execution
    2. Concurrency Control
      1. Read/Write Locks
      2. Lock Granularity
        1. Table locks
        2. Row locks
    3. Transactions
      1. Isolation Levels
      2. Deadlocks
      3. Transaction Logging
      4. Transactions in MySQL
        1. AUTOCOMMIT
        2. Mixing storage engines in transactions
        3. Implicit and explicit locking
    4. Multiversion Concurrency Control
    5. MySQL’s Storage Engines
      1. The MyISAM Engine
        1. Storage
        2. MyISAM features
        3. Compressed MyISAM tables
      2. The MyISAM Merge Engine
      3. The InnoDB Engine
      4. The Memory Engine
      5. The Archive Engine
      6. The CSV Engine
      7. The Federated Engine
      8. The Blackhole Engine
      9. The NDB Cluster Engine
      10. The Falcon Engine
      11. The solidDB Engine
      12. The PBXT (Primebase XT) Engine
      13. The Maria Storage Engine
      14. Other Storage Engines
      15. Selecting the Right Engine
      16. Considerations
      17. Practical Examples
        1. Logging
        2. Read-only or read-mostly tables
        3. Order processing
        4. Stock quotes
        5. Bulletin boards and threaded discussion forums
        6. CD-ROM applications
      18. Storage Engine Summary
      19. Table Conversions
        1. ALTER TABLE
        2. Dump and import
        3. CREATE and SELECT
  5. 2. Finding Bottlenecks: Benchmarking and Profiling
    1. Why Benchmark?
    2. Benchmarking Strategies
      1. What to Measure
    3. Benchmarking Tactics
      1. Designing and Planning a Benchmark
      2. Getting Accurate Results
      3. Running the Benchmark and Analyzing Results
    4. Benchmarking Tools
      1. Full-Stack Tools
      2. Single-Component Tools
    5. Benchmarking Examples
      1. http_load
      2. sysbench
        1. The sysbench CPU benchmark
        2. The sysbench file I/O benchmark
        3. The sysbench OLTP benchmark
        4. Other sysbench features
      3. dbt2 TPC-C on the Database Test Suite
      4. MySQL Benchmark Suite
    6. Profiling
      1. Profiling an Application
        1. How and what to measure
        2. A PHP profiling example
      2. MySQL Profiling
        1. Logging queries
        2. Finer control over logging
        3. How to read the slow query log
        4. Log analysis tools
      3. Profiling a MySQL Server
      4. Profiling Queries with SHOW STATUS
      5. SHOW PROFILE
      6. Other Ways to Profile MySQL
      7. When You Can’t Add Profiling Code
    7. Operating System Profiling
      1. Troubleshooting MySQL Connections and Processes
      2. Advanced Profiling and Troubleshooting
  6. 3. Schema Optimization and Indexing
    1. Choosing Optimal Data Types
      1. Whole Numbers
      2. Real Numbers
      3. String Types
        1. VARCHAR and CHAR types
        2. BLOB and TEXT types
        3. Using ENUM instead of a string type
      4. Date and Time Types
      5. Bit-Packed Data Types
      6. Choosing Identifiers
      7. Special Types of Data
    2. Indexing Basics
      1. Types of Indexes
        1. B-Tree indexes
        2. Hash indexes
        3. Spatial (R-Tree) indexes
        4. Full-text indexes
    3. Indexing Strategies for High Performance
      1. Isolate the Column
      2. Prefix Indexes and Index Selectivity
      3. Clustered Indexes
        1. Comparison of InnoDB and MyISAM data layout
        2. Inserting rows in primary key order with InnoDB
      4. Covering Indexes
      5. Using Index Scans for Sorts
      6. Packed (Prefix-Compressed) Indexes
      7. Redundant and Duplicate Indexes
      8. Indexes and Locking
    4. An Indexing Case Study
      1. Supporting Many Kinds of Filtering
      2. Avoiding Multiple Range Conditions
      3. Optimizing Sorts
    5. Index and Table Maintenance
      1. Finding and Repairing Table Corruption
      2. Updating Index Statistics
      3. Reducing Index and Data Fragmentation
    6. Normalization and Denormalization
      1. Pros and Cons of a Normalized Schema
      2. Pros and Cons of a Denormalized Schema
      3. A Mixture of Normalized and Denormalized
      4. Cache and Summary Tables
        1. Counter tables
    7. Speeding Up ALTER TABLE
      1. Modifying Only the .frm File
      2. Building MyISAM Indexes Quickly
    8. Notes on Storage Engines
      1. The MyISAM Storage Engine
      2. The Memory Storage Engine
      3. The InnoDB Storage Engine
  7. 4. Query Performance Optimization
    1. Slow Query Basics: Optimize Data Access
      1. Are You Asking the Database for Data You Don’t Need?
      2. Is MySQL Examining Too Much Data?
        1. Execution time
        2. Rows examined and rows returned
        3. Rows examined and access types
    2. Ways to Restructure Queries
      1. Complex Queries Versus Many Queries
      2. Chopping Up a Query
      3. Join Decomposition
    3. Query Execution Basics
      1. The MySQL Client/Server Protocol
        1. Query states
      2. The Query Cache
      3. The Query Optimization Process
        1. The parser and the preprocessor
        2. The query optimizer
        3. Table and index statistics
        4. MySQL’s join execution strategy
        5. The execution plan
        6. The join optimizer
        7. Sort optimizations
      4. The Query Execution Engine
      5. Returning Results to the Client
    4. Limitations of the MySQL Query Optimizer
      1. Correlated Subqueries
        1. When a correlated subquery is good
        2. UNION limitations
        3. Index merge optimizations
        4. Equality propagation
        5. Parallel execution
        6. Hash joins
        7. Loose index scans
        8. MIN() and MAX()
        9. SELECT and UPDATE on the same table
    5. Optimizing Specific Types of Queries
      1. Optimizing COUNT() Queries
        1. What COUNT() does
        2. Myths about MyISAM
        3. Simple optimizations
        4. More complex optimizations
      2. Optimizing JOIN Queries
      3. Optimizing Subqueries
      4. Optimizing GROUP BY and DISTINCT
        1. Optimizing GROUP BY WITH ROLLUP
      5. Optimizing LIMIT and OFFSET
      6. Optimizing SQL_CALC_FOUND_ROWS
      7. Optimizing UNION
    6. Query Optimizer Hints
    7. User-Defined Variables
      1. Be Careful with MySQL Upgrades
  8. 5. Advanced MySQL Features
    1. The MySQL Query Cache
      1. How MySQL Checks for a Cache Hit
      2. How the Cache Uses Memory
      3. When the Query Cache Is Helpful
      4. How to Tune and Maintain the Query Cache
        1. Reducing fragmentation
        2. Improving query cache usage
      5. InnoDB and the Query Cache
      6. General Query Cache Optimizations
      7. Alternatives to the Query Cache
    2. Storing Code Inside MySQL
      1. Stored Procedures and Functions
      2. Triggers
      3. Events
      4. Preserving Comments in Stored Code
    3. Cursors
    4. Prepared Statements
      1. Prepared Statement Optimization
      2. The SQL Interface to Prepared Statements
      3. Limitations of Prepared Statements
    5. User-Defined Functions
    6. Views
      1. Updatable Views
      2. Performance Implications of Views
      3. Limitations of Views
    7. Character Sets and Collations
      1. How MySQL Uses Character Sets
        1. Defaults for creating objects
        2. Settings for client/server communication
        3. How MySQL compares values
        4. Special-case behaviors
      2. Choosing a Character Set and Collation
      3. How Character Sets and Collations Affect Queries
    8. Full-Text Searching
      1. Natural-Language Full-Text Searches
      2. Boolean Full-Text Searches
      3. Full-Text Changes in MySQL 5.1 and Beyond
      4. Full-Text Tradeoffs and Workarounds
      5. Full-Text Tuning and Optimization
    9. Foreign Key Constraints
    10. Merge Tables and Partitioning
      1. Merge Tables
        1. Merge table performance impacts
        2. Merge table strengths
      2. Partitioned Tables
        1. Why partitioning works
        2. Partitioning examples
        3. Partitioned table limitations
        4. Optimizing queries against partitioned tables
    11. Distributed (XA) Transactions
      1. Internal XA Transactions
      2. External XA Transactions
  9. 6. Optimizing Server Settings
    1. Configuration Basics
      1. Syntax, Scope, and Dynamism
      2. Side Effects of Setting Variables
      3. Getting Started
    2. General Tuning
      1. Tuning Memory Usage
        1. How much memory can MySQL use?
        2. Per-connection memory needs
        3. Reserving memory for the operating system
        4. Allocating memory for caches
      2. The MyISAM Key Cache
        1. The MyISAM key block size
      3. The InnoDB Buffer Pool
      4. The Thread Cache
      5. The Table Cache
      6. The InnoDB Data Dictionary
    3. Tuning MySQL’s I/O Behavior
      1. MyISAM I/O Tuning
      2. InnoDB I/O Tuning
        1. The InnoDB transaction log
        2. How InnoDB opens and flushes log and data files
        3. The InnoDB tablespace
        4. The doublewrite buffer
        5. Other I/O tuning options
    4. Tuning MySQL Concurrency
      1. MyISAM Concurrency Tuning
      2. InnoDB Concurrency Tuning
    5. Workload-Based Tuning
      1. Optimizing for BLOB and TEXT Workloads
        1. Optimizing for Filesorts
      2. Inspecting MySQL Server Status Variables
    6. Tuning Per-Connection Settings
  10. 7. Operating System and Hardware Optimization
    1. What Limits MySQL’s Performance?
    2. How to Select CPUs for MySQL
      1. Which Is Better: Fast CPUs or Many CPUs?
      2. CPU Architecture
      3. Scaling to Many CPUs and Cores
    3. Balancing Memory and Disk Resources
      1. Random Versus Sequential I/O
      2. Caching, Reads, and Writes
      3. What’s Your Working Set?
        1. The working set and the cache unit
      4. Finding an Effective Memory-to-Disk Ratio
      5. Choosing Hard Disks
    4. Choosing Hardware for a Slave
    5. RAID Performance Optimization
      1. RAID Failure, Recovery, and Monitoring
      2. Balancing Hardware RAID and Software RAID
      3. RAID Configuration and Caching
        1. The RAID stripe chunk size
        2. The RAID cache
    6. Storage Area Networks and Network-Attached Storage
      1. Storage Area Networks
      2. Network-Attached Storage
    7. Using Multiple Disk Volumes
    8. Network Configuration
    9. Choosing an Operating System
    10. Choosing a Filesystem
    11. Threading
    12. Swapping
    13. Operating System Status
      1. How to Read vmstat Output
      2. How to Read iostat Output
      3. A CPU-Bound Machine
      4. An I/O-Bound Machine
      5. A Swapping Machine
      6. An Idle Machine
  11. 8. Replication
    1. Replication Overview
      1. Problems Solved by Replication
      2. How Replication Works
    2. Setting Up Replication
      1. Creating Replication Accounts
      2. Configuring the Master and Slave
      3. Starting the Slave
      4. Initializing a Slave from Another Server
      5. Recommended Replication Configuration
    3. Replication Under the Hood
      1. Statement-Based Replication
      2. Row-Based Replication
      3. Replication Files
      4. Sending Replication Events to Other Slaves
      5. Replication Filters
    4. Replication Topologies
      1. Master and Multiple Slaves
      2. Master-Master in Active-Active Mode
      3. Master-Master in Active-Passive Mode
      4. Master-Master with Slaves
      5. Ring
      6. Master, Distribution Master, and Slaves
      7. Tree or Pyramid
      8. Custom Replication Solutions
        1. Selective replication
        2. Separating functions
        3. Data archiving
        4. Using slaves for full-text searches
        5. Read-only slaves
        6. Emulating multimaster replication
        7. Creating a log server
    5. Replication and Capacity Planning
      1. Why Replication Doesn’t Help Scale Writes
      2. Plan to Underutilize
    6. Replication Administration and Maintenance
      1. Monitoring Replication
      2. Measuring Slave Lag
      3. Determining Whether Slaves Are Consistent with the Master
      4. Resyncing a Slave from the Master
      5. Changing Masters
        1. Planned promotions
        2. Unplanned promotions
        3. Locating the desired log positions
      6. Switching Roles in a Master-Master Configuration
    7. Replication Problems and Solutions
      1. Errors Caused by Data Corruption or Loss
      2. Using Nontransactional Tables
      3. Mixing Transactional and Nontransactional Tables
      4. Nondeterministic Statements
      5. Different Storage Engines on the Master and Slave
      6. Data Changes on the Slave
      7. Nonunique Server IDs
      8. Undefined Server IDs
      9. Dependencies on Nonreplicated Data
      10. Missing Temporary Tables
      11. Not Replicating All Updates
      12. Lock Contention Caused by InnoDB Locking Selects
      13. Writing to Both Masters in Master-Master Replication
      14. Excessive Replication Lag
        1. Don’t duplicate the expensive part of writes
        2. Do writes in parallel outside of replication
        3. Prime the cache for the slave thread
      15. Oversized Packets from the Master
      16. Limited Replication Bandwidth
      17. No Disk Space
      18. Replication Limitations
    8. How Fast Is Replication?
    9. The Future of MySQL Replication
  12. 9. Scaling and High Availability
    1. Terminology
    2. Scaling MySQL
      1. Planning for Scalability
      2. Buying Time Before Scaling
      3. Scaling Up
      4. Scaling Out
        1. Functional partitioning
        2. Data sharding
        3. Choosing a partitioning key
        4. Querying across shards
        5. Allocating data, shards, and nodes
        6. Fixed allocation
        7. Dynamic allocation
        8. Explicit allocation
        9. Rebalancing shards
        10. Generating globally unique IDs
        11. Tools for sharding
      5. Scaling Back
        1. Keeping active data separate
      6. Scaling by Clustering
        1. Clustering
        2. Federation
    3. Load Balancing
      1. Connecting Directly
        1. Splitting reads and writes in replication
        2. Changing the application configuration
        3. Changing DNS names
        4. Moving IP addresses
      2. Introducing a Middleman
        1. Load balancers
        2. Load-balancing algorithms
        3. Adding and removing servers in the pool
      3. Load Balancing with a Master and Multiple Slaves
    4. High Availability
      1. Planning for High Availability
      2. Adding Redundancy
        1. Shared-storage architectures
        2. Replicated-disk architectures
        3. Synchronous MySQL replication
      3. Failover and Failback
        1. Promoting a slave or switching roles
        2. Virtual IP addresses or IP takeover
        3. The MySQL Master-Master Replication Manager
        4. Middleman solutions
        5. Handling failover in the application
  13. 10. Application-Level Optimization
    1. Application Performance Overview
      1. Find the Source of the Problem
      2. Look for Common Problems
    2. Web Server Issues
      1. Finding the Optimal Concurrency
    3. Caching
      1. Caching Below the Application
      2. Application-Level Caching
      3. Cache Control Policies
      4. Cache Object Hierarchies
      5. Pregenerating Content
    4. Extending MySQL
    5. Alternatives to MySQL
  14. 11. Backup and Recovery
    1. Overview
      1. Terminology
      2. It’s All About Recovery
      3. Topics We Won’t Cover
      4. The Big Picture
      5. Why Backups?
    2. Considerations and Tradeoffs
      1. What Can You Afford to Lose?
      2. Online or Offline Backups?
      3. Logical or Raw Backups?
        1. Logical backups
        2. Raw backups
      4. What to Back Up
        1. Incremental backups
      5. Storage Engines and Consistency
        1. Data consistency
        2. File consistency
      6. Replication
    3. Managing and Backing Up Binary Logs
      1. The Binary Log Format
      2. Purging Old Binary Logs Safely
    4. Backing Up Data
      1. Making a Logical Backup
        1. SQL dumps
        2. Delimited file backups
        3. Parallel dump and restore
      2. Filesystem Snapshots
        1. How LVM snapshots work
        2. Prerequisites and configuration
        3. Creating, mounting, and removing an LVM snapshot
        4. LVM snapshots for online backups
        5. Lock-free InnoDB backups with LVM snapshots
        6. Planning for LVM backups
        7. Other uses and alternatives
    5. Recovering from a Backup
      1. Limiting Access to MySQL
      2. Restoring Raw Files
        1. Starting MySQL after restoring raw files
      3. Restoring Logical Backups
        1. Loading SQL files
        2. Loading delimited files
      4. Point-in-Time Recovery
      5. More Advanced Recovery Techniques
        1. Delayed replication for fast recovery
        2. Recovering with a log server
      6. InnoDB Recovery
        1. Causes of InnoDB corruption
        2. How to recover corrupted InnoDB data
    6. Backup and Recovery Speed
    7. Backup Tools
      1. mysqldump
      2. mysqlhotcopy
      3. InnoDB Hot Backup
      4. mk-parallel-dump
      5. mylvmbackup
      6. Zmanda Recovery Manager
        1. Installing and testing ZRM
      7. R1Soft
      8. MySQL Online Backup
      9. Comparison of Backup Tools
    8. Scripting Backups
  15. 12. Security
    1. Terminology
    2. Account Basics
      1. Privileges
      2. The Grant Tables
      3. How MySQL Checks Privileges
      4. Adding, Removing, and Viewing Grants
      5. Setting Up MySQL Privileges
      6. Privilege Changes in MySQL 4.1
      7. Privilege Changes in MySQL 5.0
        1. Stored routines
        2. Triggers
        3. Views
        4. Privileges on the INFORMATION_SCHEMA tables
      8. Privileges and Performance
      9. Common Problems and Solutions
        1. Connecting through localhost versus 127.0.0.1
        2. Using temporary tables safely
        3. Disallowing passwordless access
        4. Disabling anonymous users
        5. Remember to quote hostnames separately
        6. Don’t reuse usernames
        7. Granting SELECT allows SHOW CREATE TABLE
        8. Don’t grant privileges on the mysql database
        9. Don’t grant the SUPER privilege freely
        10. Granting privileges on wildcarded databases
        11. Revoking specific privileges
        12. Users can connect even after REVOKE
        13. When you can’t grant or revoke a privilege
        14. Invisible privileges
        15. Obsolete privileges
    3. Operating System Security
      1. Guidelines
    4. Network Security
      1. Localhost-Only Connections
      2. Firewalling
        1. No default route
      3. MySQL in a DMZ
      4. Connection Encryption and Tunneling
        1. Virtual private networks
        2. SSL in MySQL
        3. SSH tunneling
      5. TCP Wrappers
      6. Automatic Host Blocking
    5. Data Encryption
      1. Hashing Passwords
      2. Encrypted Filesystems
      3. Application-Level Encryption
        1. Design issues
        2. Encrypting and decrypting inside MySQL
      4. Source Code Modification
    6. MySQL in a chrooted Environment
  16. 13. MySQL Server Status
    1. System Variables
    2. SHOW STATUS
      1. Thread and Connection Statistics
      2. Binary Logging Status
      3. Command Counters
      4. Temporary Files and Tables
      5. Handler Operations
      6. MyISAM Key Buffer
      7. File Descriptors
      8. Query Cache
      9. SELECT Types
      10. Sorts
      11. Table Locking
      12. Secure Sockets Layer (SSL)
      13. InnoDB-Specific
      14. Plug-in-Specific
      15. Miscellaneous
    3. SHOW INNODB STATUS
      1. Header
      2. SEMAPHORES
      3. LATEST FOREIGN KEY ERROR
      4. LATEST DETECTED DEADLOCK
      5. TRANSACTIONS
      6. FILE I/O
      7. INSERT BUFFER AND ADAPTIVE HASH INDEX
      8. LOG
      9. BUFFER POOL AND MEMORY
      10. ROW OPERATIONS
    4. SHOW PROCESSLIST
    5. SHOW MUTEX STATUS
    6. Replication Status
    7. INFORMATION_SCHEMA
  17. 14. Tools for High Performance
    1. Interface Tools
      1. MySQL Visual Tools
      2. SQLyog
      3. phpMyAdmin
    2. Monitoring Tools
      1. Noninteractive Monitoring Systems
        1. Homegrown systems
        2. Nagios
        3. Alternatives to Nagios
        4. MySQL Monitoring and Advisory Service
        5. MONyog
        6. RRDTool-based systems
      2. Interactive Tools
        1. innotop
    3. Analysis Tools
      1. HackMySQL Tools
      2. Maatkit Analysis Tools
    4. MySQL Utilities
      1. MySQL Proxy
      2. Dormando’s Proxy for MySQL
      3. Maatkit Utilities
    5. Sources of Further Information
  18. A. Transferring Large Files
    1. Copying Files
      1. A Naive Example
      2. A One-Step Method
      3. Avoiding Encryption Overhead
      4. Other Options
    2. File Copy Benchmarks
  19. B. Using EXPLAIN
    1. Invoking EXPLAIN
      1. Rewriting Non-SELECT Queries
    2. The Columns in EXPLAIN
      1. The id Column
      2. The select_type Column
      3. The table Column
        1. Derived tables and unions
        2. An example of complex SELECT types
      4. The type Column
      5. The possible_keys Column
      6. The key Column
      7. The key_len Column
      8. The ref Column
      9. The rows Column
      10. The filtered Column
      11. The Extra Column
    3. Visual EXPLAIN
  20. C. Using Sphinx with MySQL
    1. Overview: A Typical Sphinx Search
    2. Why Use Sphinx?
      1. Efficient and Scalable Full-Text Searching
      2. Applying WHERE Clauses Efficiently
      3. Finding the Top Results in Order
      4. Optimizing GROUP BY Queries
      5. Generating Parallel Result Sets
      6. Scaling
      7. Aggregating Sharded Data
    3. Architectural Overview
      1. Installation Overview
      2. Typical Partition Use
    4. Special Features
      1. Phrase Proximity Ranking
      2. Support for Attributes
      3. Filtering
      4. The SphinxSE Pluggable Storage Engine
      5. Advanced Performance Control
    5. Practical Implementation Examples
      1. Full-Text Searching on Mininova.org
      2. Full-Text Searching on BoardReader.com
      3. Optimizing Selects on Sahibinden.com
      4. Optimizing GROUP BY on BoardReader.com
      5. Optimizing Sharded JOIN Queries on Grouply.com
    6. Conclusion
  21. D. Debugging Locks
    1. Lock Waits at the Server Level
      1. Table Locks
        1. Finding out who holds a lock
      2. The Global Read Lock
      3. Name Locks
      4. User Locks
    2. Lock Waits in Storage Engines
      1. InnoDB Lock Waits
        1. Toward more usable lock output
      2. Falcon Lock Waits
  22. Index
  23. About the Authors
  24. Colophon
  25. Copyright