linkedin Skip to Main Content
Categories

How to Use Indexes to Increase MySQL Database Performance

Development

What are Indexes?

Have you ever tried finding a chapter in a huge book, and you head over to the “Table of Contents” or Index page and seek the page number for the chapter? That’s a similar way DBMS use indexes to speed up data retrieval. Indexes are made up of keys from one or more columns in a table and they contain pointers that tell MySQL where a particular row of data is stored in the database. This enables the MySQL server to skip searching long rows of a table to find that piece of data thus boosting query speed. Additionally, indexes enable data to be better organized on disk. The MySQL Join Optimizer also uses indexes to speed up queries that involve joins.

Indexing is one of the most powerful features of a database. It’s basically the most important thing database engineers consider when optimizing databases for speed. 

Indexing basics

Creating an index in MySQL is done using the CREATE INDEX command, and has the following syntax:

CREATE INDEX index_name ON table_name ( column1, column2,...)
Code language: SQL (Structured Query Language) (sql)

You can also set up indexes when creating a new table using the INDEX statement:

CREATE TABLE Customer ( last_name varchar(30) not null, first_name varchar(30) not null, email varchar(50) not null, INDEX(email) );
Code language: SQL (Structured Query Language) (sql)

You can also use a PRIMARY KEY statement instead of an INDEX statement. However, note that a table can only contain a single PRIMARY KEY, and the key must contain UNIQUE values and no NULL values:

CREATE TABLE Customer ( id int NOT NULL, last_name varchar(30) NOT NULL, first_name varchar(30) NOT NULL, email varchar(50) NOT NULL, CONSTRAINT PK_Customer PRIMARY KEY (id,last_name));
Code language: SQL (Structured Query Language) (sql)

For existing tables, you can also add indexes as follows:

ALTER TABLE Customer ADD INDEX (email); ALTER TABLE Customer ADD PRIMARY KEY (id);
Code language: SQL (Structured Query Language) (sql)

Deleting existing indexes is simple, just use the DROP INDEX statement:

ALTER TABLE Customer DROP INDEX (email);
Code language: SQL (Structured Query Language) (sql)

Types of Indexes

MySQL offers many types of indexes, usually categorized according to their data structure. Most of the indexes created in MySQL are stored in B-trees. These include PRIMARY KEY, INDEX, FULLTEXT, and UNIQUE indexes. For Spatial Data types, MySQL uses R-trees to store their indexes. Memory tables use hash indexes by default, but B-tree indexes are also supported.

In the following subheadings, we’ll explore all the different types of indexes, their benefits, and their disadvantages.

B-Tree indexes

Most storage engines in MySQL use B-tree indexes by default. B-tree indexes can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators. They can also be used in LIKE comparisons if the argument is a constant string that does not start with a wildcard (%,_) character.

B-Tree Indexes Characteristics

B-Trees are search trees commonly used by large databases to access data stored on the disk. Because of its properties, searching for data in a data set can be achieved in significantly less time than otherwise. A B-tree stores all its values in leaves that are sorted in increasing order. All leaves are at the same level and have the same distance from the root node. 

When the storage engine performs a lookup on a table, it doesn’t need to scan all the rows to find the desired data instead, it will traverse the B-tree starting from the root node. The root node holds pointers to child nodes. The storage engine uses these pointers to find a leaf page that contains pointers to the indexed data. This process is fast because the indexed values are arranged in order. For example, when looking up an index for a column that contains text values, the storage engine will traverse the tree in alphabetical order. We summarize the steps MySQL’s storage engine will need to take to locate a piece of indexed data by utilizing a B-tree below:

  1. Start at the root node and proceed to the next level in the tree.
  2. Find a node that contains a range of values between a lower and upper limit (for example, every country whose name begins with S through T).
  3. Traverse the tree until it finds the node with the closest range using pointers between nodes.
  4. If the node is found, it traverses through the leaf pages using pointers until the leaf page with the indexed data is found.

Considerations When Using B-Tree Indexes

Earlier, we stated that B-tree indexes can be used in LIKE comparisons. In other to fully utilize indexes, we have to ensure that the argument is a constant string and must not include a wildcard character. For example, the following SELECT statements are using indexes:

SELECT * FROM customer WHERE country LIKE 'Spain%'; SELECT * FROM customer WHERE country LIKE 'Sp%_in%';
Code language: SQL (Structured Query Language) (sql)

In the first statement, only rows with 'Spain' <= country < 'Spaio' are considered. While the second statement will consider only rows with 'Sp' <= country < 'Sq'.

Note that MySQL is checking the last character of the string in alphabetical order; the letter “o” comes after the letter “n”.

The following SELECT statements will not use indexes:

SELECT * FROM customer WHERE country LIKE '%Spain%'; SELECT * FROM customer WHERE country LIKE another_col;
Code language: SQL (Structured Query Language) (sql)

The first statement starts with a wildcard, so it will fail to use an index, and the second statement will not also because its LIKE argument is not a constant string.

Any index that fails to span all AND levels in a WHERE clause is not used in optimizing the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group. For example, the following statements use indexes:

SELECT * FROM table_name WHERE index_part1=1 AND index_part2=2 AND other_column=3 /* index = 1 OR index = 2 */ SELECT * FROM table_name WHERE index=1 OR A=10 AND index=2; /* optimized like "index_part1='hello'" */ SELECT * FROM table_name WHERE index_part1='hello' AND index_part3=5; /* Can use index on index1 but not on index2 or index3 */ SELECT * FROM table_name WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
Code language: SQL (Structured Query Language) (sql)

But these WHERE clauses do not use indexes:

/* index_part1 is not used */ SELECT * FROM table_name WHERE index_part2=1 AND index_part3=2 /* Index is not used in both parts of the WHERE clause */ SELECT * FROM table_name WHERE index=1 OR A=10 /* No index spans all rows */ SELECT * FROM table_name WHERE index_part1=1 OR index_part2=10
Code language: SQL (Structured Query Language) (sql)

Sometimes the MySQL Query Optimizer does not use indexes in optimizing queries. This is because the optimizer may have estimated that the index would require MySQL to access a large number of rows than otherwise. To fully understand how the MySQL Optimizer treats queries that utilize indexes, I would recommend you check this article on query performance optimization.

In summary, it’s a good practice to check and confirm if your queries actually use the indexes you created in the tables. In case the optimizer is wrong, you can always resort to using Index hints to overwrite the default behavior. We discussed all of that in the suggested article.

Hash Indexes

Hash indexes are also used by MySQL to improve data access speed, just like B-tree indexes. The differences, however, are:

  • Hash indexes can only be used in equality comparisons that use the = or <=> operators.
  • They can not be used to optimize ORDER BY operations because hash indexes cannot be used to search the next entry in order.
  • They are not so good for ranges, as MySQL cannot determine the number of rows between two values.
  • Hash indexes can not also be used for sorting because they don’t store rows in order.
  • Unlike B-tree indexes that can use any leftmost prefix of a key to find rows, Hash indexes can only use whole keys to find rows.

Advantages of indexes in MySQL

By now you should know that the primary benefit of indexes is in speeding up search queries by enabling the MySQL server to navigate to a particular location in a table much quicker. Other benefits include:

  • Uniqueness: Indexes such as PRIMARY KEY indexes and UNIQUE key indexes can help reduce data duplication.
  • Full-Text Search: FULLTEXT indexes allow MySQL to perform full-text search capabilities. Meaning you can search against a large amount of text located in any field.
  • Reducing the number of Disk I/Os (page fetches).

Disadvantages of indexes in MySQL

Indexes are great for faster data lookup, however, this comes with some drawbacks. The main drawbacks of using indexes are as follows:

  • Slower Writes: Indexes slow down INSERT, UPDATE, and DELETE queries. This is because when an indexed field is updated, the index also needs to be updated together with it.
  • Increased disk usage: Using indexes, you store more data.

Maintaining indexes to improve query performance and reduce resource consumption

This section covers index maintenance concepts, such as index fragmentation, and the impact they have on query performance and resource consumption. You will learn how and when to rebuild or repair tables and indexes. You will also learn how to identify and remove index fragmentation.

Index Fragmentation

Index fragmentation wastes disk space and can hinder performance. Fragmentation means that the physical ordering of the index pages on the disk is not close to the index ordering of the records on the pages. It also means that many unused pages in the 64-page blocks were allocated to the index. For example, when MySQL deletes rows of data from a table, it leaves an empty space on the disk. Over time this space increases and causes gaps (fragments) in the space allocated for that table. Sometimes MySQL will try to use the spaces when inserting new data, but the gaps may still persist, and this eventually leads to fragmentation.

Using the SHOW TABLE STATUS command, we can check if fragmentation has occurred in a table. To demonstrate, let’s create a database and a table and then perform some writes to the table:

mysql> CREATE DATABASE coderpad; Query OK, 1 row affected (0.01 sec) mysql> USE coderpad; Database changed mysql> CREATE TABLE frag_table (col1 varchar(64)) ENGINE=MyISAM; Query OK, 0 rows affected (0.59 sec)
Code language: SQL (Structured Query Language) (sql)

Let’s insert some rows of data to our new table:

mysql> INSERT INTO frag_table VALUES ('row 1'); Query OK, 1 row affected (0.02 sec) mysql> INSERT INTO frag_table VALUES ('row 2'); Query OK, 1 row affected (0.02 sec) mysql> INSERT INTO frag_table VALUES ('row 3'); Query OK, 1 row affected (0.04 sec)
Code language: SQL (Structured Query Language) (sql)

Let’s check if fragmentation is present in the table:

mysql> SHOW TABLE STATUS FROM coderpad\G *************************** 1. row *************************** Name: frag_table Engine: MyISAM Version: 10 Row_format: Dynamic Rows: 3 Avg_row_length: 20 Data_length: 60 Max_data_length: 281474976710655 Index_length: 1024 Data_free: 0 Auto_increment: 1 Create_time: 2022-10-21 09:04:33 Update_time: 2022-10-21 09:04:33 Check_time: NULL Collation: utf8mb4_0900_ai_ci Checksum: NULL Create_options: Comment: 1 row in set (0.01 sec)
Code language: SQL (Structured Query Language) (sql)

The Data_free column shows any empty spaces or gaps in our table. At the moment, its value is 0, meaning there’s no fragmentation yet. The value will increase as your table expands and more write operations occur.

Defragmenting a table

MySQL offers a way to defragment a table using a simple command, OPTIMIZE TABLE. We can defragment the table in our previous example as follows:

OPTIMIZE TABLE frag_table;
Code language: SQL (Structured Query Language) (sql)

If you use a storage engine that doesn’t support the OPTIMIZE TABLE command, you can use a null ALTER TABLE operation:

ALTER TABLE frag_table ENGINE=INNODB
Code language: SQL (Structured Query Language) (sql)

Another way to defragment a table is to dump the table to a text file using mysqldump, drop the table, and then reload it from the file.

Rebuilding or repairing tables or indexes

Changes to how MySQL handles data types may require us to rebuild or repair tables and indexes. For example, we explored a use case in the section about fragmentation. Other use cases that might require us to rebuild or repair tables might also include error messages in a collation or reports from CHECK TABLE, mysqlcheck, or mysql_upgrade.

There are three methods available for rebuilding a table:

  • Dump and Reload Method.
  • Alter Table Method.
  • Repair Table Method.

Dump and reload method

This method involves using mysqldump to create a dump file and then using MySQL to reload the file.

To rebuild a table named “table_name”, you can dump it and reload it as follows:

mysqldump db_name table_name > dump.sql mysql db_name < dump.sql
Code language: SQL (Structured Query Language) (sql)

To rebuild all tables in a database, you can omit the table_name as follows:

mysqldump db_name > dump.sql mysql db_name < dump.sql
Code language: SQL (Structured Query Language) (sql)

ALTER TABLE method

You can also use the ALTER TABLE method as covered in the section titled “Defragmenting a Table”.

ALTER TABLE frag_table ENGINE=INNODB
Code language: SQL (Structured Query Language) (sql)

REPAIR TABLE method

The REPAIR TABLE method is used when the CHECK TABLE operation indicates that corruption exists or an update is required. This REPAIR TABLE method is only applicable to MyISAM, ARCHIVE, and CSV tables. The syntax is as follows:

REPAIR TABLE table_name;
Code language: SQL (Structured Query Language) (sql)

For convenience, you can use the mysqlcheck –repair command to access the REPAIR TABLE statement via the command line:

mysqlcheck --repair --databases db_name ... mysqlcheck --repair --all-databases
Code language: SQL (Structured Query Language) (sql)

OPTIMIZE TABLE

While REPAIR TABLE is used in fixing table corruption problems, OPTIMIZE TABLE is used in removing unused space occupied by the table’s data and associated index data. This improves I/O efficiency when accessing the table. The changes made by the OPTIMIZE TABLE statement depend on the storage engine used in creating the table. Also, note that this statement can only be used for InnoDB, MyISAM, and Archive Tables. For in-memory NDB tables, only dynamic columns are supported. For the other non-supported storage engines, you’ll need to start mysqld with the –skip-new option. In this case, OPTIMIZE TABLE is just mapped to ALTER TABLE.

MySQL maps the OPTMIZE TABLE statement to ALTER TABLE…FORCE for InnoDB tables. And this rebuilds the table to update index statistics and free unused space in the clustered index. An output is also displayed showing the resulting status after running the statement, as follows:

mysql> OPTIMIZE TABLE frag_table; +----------+----------+----------+-------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------+----------+----------+-------------------------------------------------------------------+ | test.frag_table | optimize | note | Table does not support optimize, doing recreate + analyze instead | | test.frag_table | optimize | status | OK | +----------+----------+----------+-------------------------------------------------------------------+
Code language: SQL (Structured Query Language) (sql)

Final thoughts

Indexing is one of the most discussed topics in DBMS. It can also be the most powerful tool in your database optimization arsenal if utilized well.

A common mistake developers make is trying to add indexes everywhere. This isn’t recommended because, as we’ve noted earlier in this guide, indexes consume storage space, and they slow down write operations in your database. Therefore, if you know your database will perform lots of write queries, you might want to take your time and decide where indexes are needed the most. The opposite is also true, and if you’ll be performing lots of read operations, you can optimize your queries by adding indexes to the frequently searched columns.

If you’re not sure about index performance in your queries, MySQL has tools that can help you analyze a query and determine if indexes were used in optimizing it. We discussed that in detail in our article on query optimization

References

MySQL 8.0 Reference Manual – Comparison of B-Tree and Hash Indexes

MySQL 8.0 Reference Manual – Defragmenting a Table

MySQL 8.0 Reference Manual – Rebuilding or Repairing Tables or Indexes