Tuning the Database System

Careful index design is one technique that improves the speed of a DBMS and can reduce the resource requirements of a database. However, comprehensive database tuning is a complex topic that fills many books. We include in this section only a few additional practical ideas to begin to improve the performance of a database system.

As discussed previously, accessing a hard disk is slow and is usually the bottleneck in DBMS performance. More specifically, disk seeking-moving the disk head to get information from another location of the disk-is the slowest component of disk access. Therefore, most techniques described in this section are also techniques that improve performance by minimizing disk space requirements.[3]

[3] Reducing disk space requirements improves both disk seek and read performance. Disk read performance is improved because less data is required to be transferred, while seek performance is improved because the disk head has to move less on average when randomly accessing a smaller file than when accessing a larger file.

Here are some ways to improve DBMS performance:

  • Carefully choose attribute types and lengths. Where possible, use small variants such as SMALLINT or MEDIUMINT rather than the regular choice INT. When using fixed-length attributes, such as CHAR, specify a length that is as short as practical.

  • Use fixed-length attributes; that is, try to avoid types such as VARCHAR or BLOB. While fixed-length text attributes may waste space, scanning fixed-length rows in a query is much faster than scanning variable-length rows.

  • Design indexes with care. As discussed in the last section, keep the primary key index as small as possible, create only indexes that are needed, and use prefixes of attributes where possible. Ensure that the leftmost attribute in the index is the most frequently used in queries and, if all attributes are used, make sure the leftmost attribute is the one with the highest number of duplicate entries.

  • Create a statistics table if aggregate functions such as COUNT( ) or SUM( ) are frequently used in queries on large tables. A statistics table stores only one row that is manually updated with the aggregate values of another table. For example, if the statistics table maintains the count of rows in a large customer table, each time a row is inserted or deleted in the customer table, the count is updated in the statistics table. For large tables, this is often faster than calculating aggregate functions with the slow built-in functions that require complete processing of all rows.

  • If large numbers of rows are deleted from a table, or a table containing variable-length attributes is frequently modified, disk space may be wasted. MySQL doesn't usually remove deleted or modified data; it only marks the location as being no longer in use. Wasted space can affect access speed.

    To reorganize a table-by copying data to a temporary location and back again-MySQL provides the OPTIMIZE TABLE command, which should be used periodically. For example:

    OPTIMIZE TABLE customer;

    The OPTIMIZE command should be run when the DBMS is offline for scheduled maintenance. The command is nonstandard SQL.

  • It is possible to create different table types for specific tasks. The default in MySQL is the MyISAM type, and all the tables described so far are this table type. For small, temporary, frequently used lookup tables, a different type, the heap table type, can be used. There are other types, and we briefly discuss alternatives in Chapter 6. More details are provided in Section 9.4 of the MySQL user manual.

  • Section 10.7 of the MySQL manual includes other excellent ideas for simple performance improvement.

Another aspect of database tuning is optimizing the performance of the DBMS itself. Included with the MySQL installation is the mysqladmin tool for database administration. Details of the system setup can be found by running the following command from a Linux shell:

% mysqladmin -ppassword variables

This shows, in part, the following selected system parameters:

join_buffer             current value: 131072
key_buffer              current value: 8388600
net_buffer_length       current value: 16384
record_buffer           current value: 131072
sort_buffer             current value: 2097144
table_cache             current value: 64

The important parameters are those that impact disk use. MySQL has several main-memory buffer parameters that control how much data is kept in memory for processing. These include:

  • The record_buffer for scanning all rows in a table

  • The sort_buffer for ORDER BY and GROUP BY operations

  • The key_buffer for storing indexes in main memory

  • The join_buffer for joins that don't use indexes

In general, the larger these buffers, the more data from disk is cached or stored in memory and the fewer disk accesses are required. However, if the sum of these parameters is near to exceeding the size of the memory installed in the server, the underlying operating system will start to swap data between disk and memory, and the DBMS will be slow. In any case, careful experimentation based on the application is likely to improve DBMS performance.

Section 10.2.3 of the MySQL manual suggests parameter settings when starting the MySQL server. First, for machines with at least 64 MB of memory, large tables in the DBMS, and a moderate number of users, use:

safe_mysqld -O key_buffer=16M -O table_cache=128 \
            -O sort_buffer=4M -O record_buffer=1M &

Second, if there is less than 64 MB of memory available, and there are many users, try the following:

safe_mysqld -O key_buffer=512k -O sort_buffer=100k \
            -O record_buffer=100k &

The following setting might be appropriate for the winestore, because many users are expected, the queries are largely index-based, and the database is small:

safe_mysqld -O key_buffer=512k -O sort_buffer=16k \
            -O table_cache=32 -O record_buffer=8k
            -O net_buffer=1K &

Even more conservative settings might also be acceptable.

There are two other parameters we have not discussed. The table_cache parameter manages the maximum number of open tables per user connection, while the net_buffer parameter sets the minimum size of the network query buffer in which incoming queries are kept before they are executed.

The mysqladmin utility can report the status of the DBMS:

% mysqladmin -ppassword status

The output has the following format:

Uptime: 5721024  Threads: 14  Questions: 7874982
Slow queries: 6  Opens: 115136  Flush tables: 1
Open tables: 62

This gives a brief point-in-time summary of the DBMS status and can help find more about the number of user connections, queries, and table use. Similar output can be generated by running the commands SHOW STATUS and SHOW VARIABLES through the MySQL command interpreter.

Information about query performance can be gained with the benchmark( ) function, which can be used iteratively for tuning when altering table design or DBMS system parameters. The following statement illustrates benchmarking:

SELECT benchmark(10000, COUNT(*))
  FROM items;

This statement reports the time taken to evaluate 10,000 calls to COUNT( ) on the items table.