Optimising Database Structure

Design Choices

MySQL keeps row data and index data in separate files. Many (almost all) other databases mix row and index data in the same file. We believe that the MySQL choice is better for a very wide range of modern systems.

Another way to store the row data is to keep the information for each column in a separate area (examples are SDBM and Focus). This will cause a performance hit for every query that accesses more than one column. Because this degenerates so quickly when more than one column is accessed, we believe that this model is not good for general-purpose databases.

The more common case is that the index and data are stored together (like in Oracle/Sybase et al.). In this case you will find the row information at the leaf page of the index. The good thing with this layout is that, depending on how well the index is cached, it saves a disk read. The bad things with this layout are:

  • Table scanning is much slower because you have to read through the indexes to get at the data.

  • You can’t use only the index table to retrieve data for a query.

  • You lose a lot of space, as you must duplicate indexes from the nodes (because you can’t store the row in the nodes).

  • Deletes will degenerate the table over time (as indexes in nodes are usually not updated on delete).

  • It’s harder to cache only the index data.

Get Your Data as Small as Possible

One of the most basic optimisation is to get your data (and indexes) to take as little space on the disk (and in memory) as possible. This can give huge improvements because disk reads are faster, and normally, less main memory will be used. Indexing also takes fewer resources if done on smaller columns.

MySQL supports a lot of different table types and row formats. Choosing the right table format may give you a big performance gain. See Chapter 7.

You can get better performance on a table and minimise storage space using the following techniques:

  • Use the most efficient (smallest) types possible. MySQL has many specialised types that save disk space and memory.

  • Use the smaller integer types if possible to get smaller tables. For example, MEDIUMINT is often better than INT.

  • Declare columns to be NOT NULL if possible. It makes everything faster and you save one bit per column. Note that if you really need NULL in your application you should definitely use it. Just avoid having it on all columns by default.

  • If you don’t have any variable-length columns (VARCHAR, TEXT, or BLOB columns), a fixed-size record format is used. This is faster but unfortunately may waste some space. See Section 7.1.2.

  • The primary index of a table should be as short as possible. This makes identification of one row easy and efficient.

  • For each table, you have to decide which storage/index method to use. See Chapter 7.

  • Only create the indexes that you really need. Indexes are good for retrieval but bad when you need to store things fast. If you mostly access a table by searching on a combination of columns, make an index on them. The first index part should be the most-used column. If you are always using many columns, you should use the column with more duplicates first to get better compression of the index.

  • If it’s very likely that a column has a unique prefix on the first number of characters, it’s better to only index this prefix. MySQL supports an index on a part of a character column. Shorter indexes are faster, not only because they take less disk space, but also because they will give you more hits in the index cache and, thus, fewer disk seeks. See Section 5.5.2.

  • In some circumstances it can be beneficial to split into two a table that is scanned very often. This is especially true if it is a dynamic-format table and it is possible to use a smaller static-format table that can be used to find the relevant rows when scanning the table.

How MySQL Uses Indexes

Indexes are used to find rows with a specific value of one column fast. Without an index MySQL has to start with the first record and then read through the whole table until it finds the relevant rows. The bigger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly get a position to seek to in the middle of the data file without having to look at all the data. If a table has 1000 rows, this is at least 100 times faster than reading sequentially. Note that if you need to access almost all 1000 rows it is faster to read sequentially because you then avoid disk seeks.

All MySQL indexes (PRIMARY, UNIQUE, and INDEX) are stored in B-trees. Strings are automatically prefix- and end-space compressed. See Section 6.5.7.

Indexes are used to:

  • Quickly find the rows that match a WHERE clause.

  • Retrieve rows from other tables when performing joins.

  • Find the MAX( ) or MIN( ) value for a specific indexed column. This is optimised by a preprocessor that checks if you are using WHERE key_part_# = constant on all key parts < N. In this case MySQL will do a single key lookup and replace the MIN( ) expression with a constant. If all expressions are replaced with constants, the query will return at once:

    SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10
  • Sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part_1,key_part_2). The key is read in reverse order if all key parts are followed by DESC. See Section 5.2.7.

  • In some cases a query can be optimised to retrieve values without consulting the data file. If all used columns for some table are numeric and form a leftmost prefix for some key, the values may be retrieved from the index tree for greater speed:

    SELECT key_part3 FROM table_name WHERE key_part1=1

Suppose you issue the following SELECT statement:

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimiser tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimiser to find rows. For example, if you have a three-column index on (col1,col2,col3), you have indexed search capabilities on (col1), (col1,col2), and (col1,col2,col3).

MySQL can’t use a partial index if the columns don’t form a leftmost prefix of the index. Suppose you have the following SELECT statements:

mysql> SELECT * FROM tbl_name WHERE col1=val1;
mysql> SELECT * FROM tbl_name WHERE col2=val2;
mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1,col2,col3), only the first query shown in the preceding statements uses the index. The second and third queries do involve indexed columns, but (col2) and (col2,col3) are not leftmost prefixes of (col1,col2,col3).

MySQL also uses indexes for LIKE comparisons if the argument to LIKE is a constant string that doesn’t start with a wildcard character. For example, the following SELECT statements use indexes:

mysql> SELECT * FROM tbl_name WHERE key_col LIKE "Patrick%";
mysql> SELECT * FROM tbl_name WHERE key_col LIKE "Pat%_ck%";

In the first statement, only rows with Patrick' <= key_col < "Patricl" are considered. In the second statement, only rows with "Pat" <= key_col < "Pau" are considered.

The following SELECT statements will not use indexes:

mysql> SELECT * FROM tbl_name WHERE key_col LIKE "%Patrick%";
mysql> SELECT * FROM tbl_name WHERE key_col LIKE other_col;

In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

Searching using column_name IS NULL will use indexes if column_name is an index.

MySQL normally uses the index that finds the least number of rows. An index is used for columns that you compare with the following operators: =, >, >=, <, <=, BETWEEN, and a LIKE with a non-wildcard prefix like 'something%'.

Any index that doesn’t span all AND levels in the WHERE clause is not used to optimise the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group.

The following WHERE clauses use indexes:

... WHERE index_part1=1 AND index_part2=2 AND other_column=3
... WHERE index=1 OR A=10 AND index=2      /* index = 1 OR index = 2 */
... WHERE index_part1='hello' AND index_part_3=5
          /* optimised like "index_part1='hello'" */
... WHERE index1=1 and index2=2 or index1=3 and index3=3;
          /* Can use index on index1 but not on index2 or index 3 */

These WHERE clauses do NOT use indexes:

... WHERE index_part2=1 AND index_part3=2  /* index_part_1 is not used */
... WHERE index=1 OR A=10                  /* Index is not used in
                                                        both AND parts */
... WHERE index_part1=1 OR index_part2=10  /* No index spans all rows  */

Note that in some cases MySQL will not use an index, even if one would be available. Some of the cases where this happens are:

  • If the use of the index would require MySQL to access more than 30% of the rows in the table. (In this case a table scan is probably much faster, as this will require us to do much fewer seeks.) Note that if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, as it can much more quickly find the few rows to return in the result.

  • If the index range may contain NULL values and you are using ORDER BY ... DESC

Column Indexes

All MySQL column types can be indexed. Use of indexes on the relevant columns is the best way to improve the performance of SELECT operations.

The maximum number of keys and the maximum index length are defined per table handler. See Chapter 7. You can with all table handlers have at least 16 keys and a total index length of at least 256 bytes.

For CHAR and VARCHAR columns, you can index a prefix of a column. This is much faster and requires less disk space than indexing the whole column. The syntax to use in the CREATE TABLE statement to index a column prefix looks like this:

KEY index_name (col_name(length))

The following example creates an index for the first 10 characters of the name column:

mysql> CREATE TABLE test (
    ->        name CHAR(200) NOT NULL,
    ->        KEY index_name (name(10)));

For BLOB and TEXT columns, you must index a prefix of the column. You cannot index the entire column.

In MySQL Versions 3.23.23 or later, you can also create special FULLTEXT indexes. They are used for full-text searches. Only the MyISAM table type supports FULLTEXT indexes. They can be created only from VARCHAR and TEXT columns. Indexing always happens over the entire column and partial indexing is not supported. See Section 6.8 for details.

Multiple-Column Indexes

MySQL can create indexes on multiple columns. An index may consist of up to 15 columns. (On CHAR and VARCHAR columns you can also use a prefix of the column as a part of an index.)

A multiple-column index can be considered a sorted array containing values that are created by concatenating the values of the indexed columns.

MySQL uses multiple-column indexes in such a way that queries are fast when you specify a known quantity for the first column of the index in a WHERE clause, even if you don’t specify values for the other columns.

Suppose a table is created using the following specification:

mysql> CREATE TABLE test (
    ->       id INT NOT NULL,
    ->       last_name CHAR(30) NOT NULL,
    ->       first_name CHAR(30) NOT NULL,
    ->       PRIMARY KEY (id),
    ->       INDEX name (last_name,first_name));

In this case, the index name is an index over last_name and first_name. The index will be used for queries that specify values in a known range for last_name, or for both last_name and first_name. Therefore, the name index will be used in the following queries:

mysql> SELECT * FROM test WHERE last_name="Widenius";

mysql> SELECT * FROM test WHERE last_name="Widenius"
    ->                    AND first_name="Michael";

mysql> SELECT * FROM test WHERE last_name="Widenius"
    ->                    AND (first_name="Michael" OR first_name="Monty");

mysql> SELECT * FROM test WHERE last_name="Widenius"
    ->                    AND first_name >="M" AND first_name < "N";

However, the name index will not be used in the following queries:

mysql> SELECT * FROM test WHERE first_name="Michael";

mysql> SELECT * FROM test WHERE last_name="Widenius"
    ->                    OR first_name="Michael";

For more information on the manner in which MySQL uses indexes to improve query performance, see Section 5.4.3.

Why So Many Open tables?

When you run mysqladmin status, you’ll see something like this:

Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12

This can be somewhat perplexing if you only have 6 tables.

MySQL is multi-threaded, so it may have many queries on the same table simultaneously. To minimise the problem with two threads having different states on the same file, the table is opened independently by each concurrent thread. This takes some memory but will normally increase performance. Wth ISAM and MyISAM tables this also requires one extra file descriptor for the data file. With these table types the index file descriptor is shared between all threads.

You can read more about this topic in the next section. See Section 5.4.7.

How MySQL Opens and Closes Tables

table_cache, max_connections, and max_tmp_tables affect the maximum number of files the server keeps open. If you increase one or both of these values, you may run up against a limit imposed by your operating system on the per-process number of open file descriptors. However, you can increase the limit on many systems. Consult your OS documentation to find out how to do this because the method for changing the limit varies widely from system to system.

table_cache is related to max_connections. For example, for 200 concurrent running connections, you should have a table cache of at least 200 * n, where n is the maximum number of tables in a join. You also need to reserve some extra file descriptors for temporary tables and files.

Make sure that your operating system can handle the number of open file descriptors implied by the table_cache setting. If table_cache is set too high, MySQL may run out of file descriptors and refuse connections, fail to perform queries, and be very unreliable. You also have to take into account that the MyISAM table handler needs two file descriptors for each unique open table. You can increase the number of file descriptors available for MySQL with the --open-files-limit=# startup option. See Section A.2.16.

The cache of open tables will be kept at a level of table_cache entries (default 64; this can be changed with the -O table_cache=# option to mysqld). Note that MySQL may temporarily open even more tables to be able to execute queries.

An unused table is closed and removed from the table cache under the following circumstances:

  • When the cache is full and a thread tries to open a table that is not in the cache.

  • When the cache contains more than table_cache entries and a thread is no longer using a table.

  • When someone executes mysqladmin refresh or mysqladmin flush-tables.

  • When someone executes ‘FLUSH TABLES.’

When the table cache fills up, the server uses the following procedure to locate a cache entry to use:

  • Tables that are not currently in use are released, in least-recently-used order.

  • If the cache is full and no tables can be released, but a new table needs to be opened, the cache is temporarily extended as necessary.

  • If the cache is in a temporarily extended state and a table goes from in-use to not-in-use state, the table is closed and released from the cache.

A table is opened for each concurrent access. This means that if you have two threads accessing the same table, or if you access the table twice in the same query (with AS) the table needs to be opened twice. The first open of any table takes two file descriptors; each additional use of the table takes only one file descriptor. The extra descriptor for the first open is used for the index file; this descriptor is shared among all threads.

If you are opening a table with the HANDLER table_name OPEN statement, a dedicated table object is allocated for the thread. This table object is not shared by other threads and will not be closed until the thread calls HANDLER table_name CLOSE, or until the thread dies. See Section 6.4.2. When this happens, the table is put back in the table_cache (if it isn’t full).

You can check if your table cache is too small by checking the mysqld variable Opened_tables. If this is quite big, even if you haven’t done a lot of FLUSH TABLES, you should increase your table cache. See Section 4.5.6.3.

Drawbacks to Creating Large Numbers of Tables in the Same Database

If you have many files in a directory, open, close, and create operations will be slow. If you execute SELECT statements on many different tables, there will be a little overhead when the table cache is full because for every table that has to be opened, another must be closed. You can reduce this overhead by making the table cache larger.

Get MySQL Reference Manual now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.