You are here

SQL Query Tuning - Performance

How could the following SQL queries be improved performance wise and otherwise and can you also explain why your change is more optimal?

Please consider, when testing, that your results are not confused by the Query Cache or by reading data from your I/O system which are an order of magnitude faster (Query Cache) or slower (I/O) than the in-memory behaviour.

Table of Contents


SQL Query Tuning - Covering function

Difficulty: Very easy.

Query:

SELECT *
  FROM bills
 WHERE UNIX_TIMESTAMP(due_date) < 1601560170
;

Table and data:

CREATE TABLE bills (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, amount DECIMAL (10,2) NOT NULL
, due_date DATETIME NOT NULL
, customer_id SMALLINT UNSIGNED NOT NULL
, KEY `due_date` (`due_date`)
);

INSERT INTO bills
SELECT NULL, ROUND(RAND()*10000, 2), FROM_UNIXTIME(UNIX_TIMESTAMP()+RAND()*86400*30), ROUND(RAND()*10000, 0);
INSERT INTO bills
SELECT NULL, ROUND(RAND()*10000, 2), FROM_UNIXTIME(UNIX_TIMESTAMP()+RAND()*86400*30), ROUND(RAND()*10000, 0)
  FROM bills;
-- Repeat this command about 14 times

Query Execution Plan (EXPLAIN) before improvement:

+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|    1 | SIMPLE      | bills | ALL  | NULL          | NULL | NULL    | NULL | 16300 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+-------+-------------+

Query runtime before improvement: 11 - 12 ms.

Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
|    1 | SIMPLE      | bills | range | due_date      | due_date | 5       | NULL | 1    | Using index condition |
+------+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+

Query runtime after improvement: About 1 ms.

Difficulty: Very easy to pretty costly.

Query:

SELECT *
  FROM customers
 WHERE email LIKE '%hans.meier%'
;

Table and data:

CREATE TABLE customers (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, first_name VARCHAR(63)
, last_name VARCHAR(127)
, email VARCHAR(255)
, zip_code VARCHAR(9)
, city VARCHAR(63)
, KEY `email` (`email`)
, INDEX (city, last_name, first_name)
);

INSERT INTO customers SELECT NULL, MD5(RAND()), MD5(RAND()), CONCAT(CRC32(RAND()), '@', MD5(RAND())), ROUND(RAND()*9999, 0), CRC32(RAND())
;
INSERT INTO customers SELECT NULL, MD5(RAND()), MD5(RAND()), CONCAT(CRC32(RAND()), '@', MD5(RAND())), ROUND(RAND()*9999, 0), CRC32(RAND())
  FROM customers;
-- Repeat this command about 14 times

Query Execution Plan (EXPLAIN) before improvement:

+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
|    1 | SIMPLE      | customers | ALL  | NULL          | NULL | NULL    | NULL | 324988 | Using where |
+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------+

Query runtime before improvement: About 150 ms.

Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-----------+-------+---------------+-------+---------+------+------+-----------------------+
| id   | select_type | table     | type  | possible_keys | key   | key_len | ref  | rows | Extra                 |
+------+-------------+-----------+-------+---------------+-------+---------+------+------+-----------------------+
|    1 | SIMPLE      | customers | range | email         | email | 258     | NULL | 1    | Using index condition |
+------+-------------+-----------+-------+---------------+-------+---------+------+------+-----------------------+

Query runtime after improvement: About 1 ms.

SQL Query Tuning - Covering function twice

Difficulty: Very easy to pretty costly.

Query:

SELECT *
  FROM customers
 WHERE LOWER(email) = LOWER('hans.meier@fromdual.com')
;

Table and data: See above.

Query Execution Plan (EXPLAIN) before improvement:

+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
|    1 | SIMPLE      | customers | ALL  | NULL          | NULL | NULL    | NULL | 519800 | Using where |
+------+-------------+-----------+------+---------------+------+---------+------+--------+-------------+

Query runtime before improvement: About 140 ms.

Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-----------+------+---------------+-------+---------+-------+------+-----------------------+
| id   | select_type | table     | type | possible_keys | key   | key_len | ref   | rows | Extra                 |
+------+-------------+-----------+------+---------------+-------+---------+-------+------+-----------------------+
|    1 | SIMPLE      | customers | ref  | email         | email | 258     | const | 1    | Using index condition |
+------+-------------+-----------+------+---------------+-------+---------+-------+------+-----------------------+

Query runtime after improvement: About 1 ms.

SQL Query Tuning - Compare 2 indexed columns

Difficulty: Medium.

Query:

SELECT COUNT(*)
  FROM cdrs
 WHERE start >= end
;

Table and data:

CREATE TABLE `cdrs` (
  `uniqueid` varchar(40) NOT NULL,
  `callid` varchar(40) NOT NULL,
  `asteriskid` varchar(20) NOT NULL DEFAULT '',
  `machine` int(11) NOT NULL DEFAULT 0,
  `status` varchar(15) NOT NULL DEFAULT '',
  `start` TIMESTAMP(6) NOT NULL,
  `end` TIMESTAMP(6) NOT NULL,
  `scustomer` int(11) NOT NULL DEFAULT 0,
  `stype` varchar(30) NOT NULL DEFAULT '',
  `snumber` varchar(255) NOT NULL DEFAULT '',
  `dcustomer` int(11) NOT NULL DEFAULT 0,
  `dtype` varchar(30) NOT NULL DEFAULT '',
  `dnumber` varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`uniqueid`),
  KEY `server` (`callid`),
  KEY `start` (`start`),
  KEY `scustomer` (`scustomer`,`start`),
  KEY `dcustomer` (`dcustomer`,`start`),
  KEY `asteriskid` (`asteriskid`)
);

INSERT INTO cdrs SELECT UUID(), MD5(RAND()), '', 0, '', FROM_UNIXTIME(ROUND(RAND()*UNIX_TIMESTAMP(), 6)), 0, 0, '', '', 0, '', '';
INSERT INTO cdrs SELECT UUID(), MD5(RAND()), '', 0, '', FROM_UNIXTIME(ROUND(RAND()*UNIX_TIMESTAMP(), 6)), 0, 0, '', '', 0, '', '' FROM cdrs;
-- Repeat this command about 20 times...
UPDATE cdrs SET end = FROM_UNIXTIME(UNIX_TIMESTAMP(start)+RAND()*3600);

Query Execution Plan (EXPLAIN) before improvement:

+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
|    1 | SIMPLE      | cdrs  | ALL  | NULL          | NULL | NULL    | NULL | 520304 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+

Query runtime before improvement: About 110 ms.

Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | cdrs  | index | NULL          | start_2 | 16      | NULL | 260250 | Using where; Using index |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+

Query runtime after improvement: About 105 ms. Not so much improvement achieved. But should be come much better when you have memory pressure.

SQL Query Tuning - Function and 2 indexed columns

Difficulty: Medium.

Query:

SELECT uniqueid
  FROM cdrs
 WHERE CURRENT_TIMESTAMP() BETWEEN start AND end
;

Table and data: See above.

Query Execution Plan (EXPLAIN) before improvement:

+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
|    1 | SIMPLE      | cdrs  | ALL  | start         | NULL | NULL    | NULL | 520304 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+

Query runtime before improvement: About 120 ms.

Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | cdrs  | range | start,start_2 | start_2 | 7       | NULL | 260152 | Using where; Using index |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+

Query runtime after improvement: About 140 ms. Worse than original. Depends heavily on data distribution!

SQL Query Tuning - Time series problem

Difficulty: High.

Query:

SELECT *
  FROM history_of_int_values
 WHERE item_id = 42
   AND timestamp BETWEEN '2020-10-01 00:00:00' AND '2010-10-31 23:59:59'
;

Table and data:

CREATE TABLE `history_of_int_values` (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `item_id` bigint(20) unsigned NOT NULL,
  `timestamp` timestamp(6) NOT NULL,
  `value` bigint(20) signed NOT NULL DEFAULT '0',
  UNIQUE KEY (`item_id`,`timestamp`),
  KEY (timestamp)
) ENGINE = InnoDB;

INSERT INTO history_of_int_values
SELECT NULL, ROUND(RAND()*100, 0), FROM_UNIXTIME(UNIX_TIMESTAMP(CURRENT_TIMESTAMP(6))-(86400*365*rand())), ROUND(RAND()*1000000, 0);
INSERT IGNORE INTO history_of_int_values
SELECT NULL, ROUND(RAND()*100, 0), FROM_UNIXTIME(UNIX_TIMESTAMP(CURRENT_TIMESTAMP(6))-(86400*365*rand())), ROUND(RAND()*1000000, 0) FROM history_of_int_values;

Query Execution Plan (EXPLAIN) before improvement:

+------+-------------+-----------------------+-------+-------------------+---------+---------+------+------+-----------------------+
| id   | select_type | table                 | type  | possible_keys     | key     | key_len | ref  | rows | Extra                 |
+------+-------------+-----------------------+-------+-------------------+---------+---------+------+------+-----------------------+
|    1 | SIMPLE      | history_of_int_values | range | item_id,timestamp | item_id | 15      | NULL | 859  | Using index condition |
+------+-------------+-----------------------+-------+-------------------+---------+---------+------+------+-----------------------+

Query runtime before improvement: 11 ms.

Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-----------------------+-------+-------------------+---------+---------+------+------+-------------+
| id   | select_type | table                 | type  | possible_keys     | key     | key_len | ref  | rows | Extra       |
+------+-------------+-----------------------+-------+-------------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | history_of_int_values | range | PRIMARY,timestamp | PRIMARY | 15      | NULL | 859  | Using where |
+------+-------------+-----------------------+-------+-------------------+---------+---------+------+------+-------------+

Query runtime after improvement: About 1 ms. Has dramatic impact if you run out of RAM!

SQL Query Tuning - Wrong built index or query

Difficulty: Medium.

Query:

SELECT *
  FROM customers
 WHERE last_name = 'Maier'
   AND first_name = 'Hans'
;

Table and data: See above.

Query Execution Plan (EXPLAIN) before improvement:

+------+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+------+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
|    1 | SIMPLE      | customers | ALL  | NULL          | NULL | NULL    | NULL | 65044 | Using where |
+------+-------------+-----------+------+---------------+------+---------+------+-------+-------------+

Query runtime before improvement: About 20 ms.

Here we have 2 different possible solutions which come to my mind:

a) Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-----------+------+---------------+-----------+---------+-------------+------+-----------------------+
| id   | select_type | table     | type | possible_keys | key       | key_len | ref         | rows | Extra                 |
+------+-------------+-----------+------+---------------+-----------+---------+-------------+------+-----------------------+
|    1 | SIMPLE      | customers | ref  | last_name     | last_name | 196     | const,const | 1    | Using index condition |
+------+-------------+-----------+------+---------------+-----------+---------+-------------+------+-----------------------+

a) Query runtime after improvement: About 1 ms.

b) Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-----------+------+---------------+------+---------+-------------------+------+-----------------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref               | rows | Extra                 |
+------+-------------+-----------+------+---------------+------+---------+-------------------+------+-----------------------+
|    1 | SIMPLE      | customers | ref  | city          | city | 262     | const,const,const | 1    | Using index condition |
+------+-------------+-----------+------+---------------+------+---------+-------------------+------+-----------------------+

b) Query runtime after improvement: About 1 ms.

SQL Query Tuning - OR optimization

Difficulty: Medium.

Query:

SELECT *
  FROM customers
 WHERE last_name = 'Meier' OR first_name = 'Hans'
;

Table and data: See above.

Query Execution Plan (EXPLAIN) before improvement:

+------+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+------+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
|    1 | SIMPLE      | customers | ALL  | NULL          | NULL | NULL    | NULL | 65044 | Using where |
+------+-------------+-----------+------+---------------+------+---------+------+-------+-------------+

Query runtime before improvement: About 20 ms.

Here we have 2 different possible solutions which come to my mind:

a) Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-----------+-------------+----------------------+----------------------+---------+------+------+------------------------------------------------+
| id   | select_type | table     | type        | possible_keys        | key                  | key_len | ref  | rows | Extra                                          |
+------+-------------+-----------+-------------+----------------------+----------------------+---------+------+------+------------------------------------------------+
|    1 | SIMPLE      | customers | index_merge | last_name,first_name | last_name,first_name | 130,66  | NULL | 2    | Using union(last_name,first_name); Using where |
+------+-------------+-----------+-------------+----------------------+----------------------+---------+------+------+------------------------------------------------+

a) Query runtime after improvement: About 2 ms.

b) Query Execution Plan (EXPLAIN) after improvement:

+------+--------------+------------+------+---------------+------------+---------+-------+------+-----------------------+
| id   | select_type  | table      | type | possible_keys | key        | key_len | ref   | rows | Extra                 |
+------+--------------+------------+------+---------------+------------+---------+-------+------+-----------------------+
|    1 | PRIMARY      | customers  | ref  | last_name     | last_name  | 130     | const | 1    | Using index condition |
|    2 | UNION        | customers  | ref  | first_name    | first_name | 66      | const | 1    | Using index condition |
| NULL | UNION RESULT | <union1,2> | ALL  | NULL          | NULL       | NULL    | NULL  | NULL |                       |
+------+--------------+------------+------+---------------+------------+---------+-------+------+-----------------------+

b) Query runtime after improvement: About 1 ms.

Nested subquery DELETE

Difficulty: Medium.

Comment: Problem does NOT happen with equivalent SELECT! It happens up to MySQL 5.7 and MariaDB 10.7. It seems to be fixed in the MySQL 8.0 Optimizer. UPDATE behaves the same like the DELETE statement:

Query:

DELETE FROM test
 WHERE id IN (SELECT id FROM t2)
;

Query Execution Plan (EXPLAIN) before improvement:

+------+--------------------+-------+-----------------+---------------+---------+---------+------+---------+-------------+
| id   | select_type        | table | type            | possible_keys | key     | key_len | ref  | rows    | Extra       |
+------+--------------------+-------+-----------------+---------------+---------+---------+------+---------+-------------+
|    1 | PRIMARY            | test  | ALL             | NULL          | NULL    | NULL    | NULL | 8021228 | Using where |
|    2 | DEPENDENT SUBQUERY | t2    | unique_subquery | PRIMARY       | PRIMARY | 4       | func | 1       | Using index |
+------+--------------------+-------+-----------------+---------------+---------+---------+------+---------+-------------+

Query runtime before improvement: 9.859 sec

Query Execution Plan (EXPLAIN) after improvement:

+------+-------------+-------+--------+---------------+---------+---------+------------+------+-------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref        | rows | Extra       |
+------+-------------+-------+--------+---------------+---------+---------+------------+------+-------------+
|    1 | SIMPLE      | t2    | index  | PRIMARY       | PRIMARY | 4       | NULL       | 6    | Using index |
|    1 | SIMPLE      | test  | eq_ref | PRIMARY       | PRIMARY | 4       | test.t2.id | 1    |             |
+------+-------------+-------+--------+---------------+---------+---------+------------+------+-------------+

Query runtime after improvement: 0.001 sec

See also MDEV-22248 and MDEV-22415. Should be fixed in MariaDB 10.8.

Wrong casting

CREATE TABLE `order` (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT
, order_id VARCHAR(64)
, data VARCHAR(255) NULL DEFAULT 'Blabla'
, PRIMARY KEY (id)
, UNIQUE INDEX (order_id)
);

INSERT INTO `order` SELECT NULL, NULL FROM dual;
INSERT INTO `order` SELECT NULL, NULL FROM `order`;
-- up to 256k rows
UPDATE `order` SET order_id = id;

EXPLAIN
SELECT *
  FROM `order`
 WHERE order_id BETWEEN 238900 AND 238999
;
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | order | ALL  | order_id      | NULL | NULL    | NULL | 2085934 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+

Execution time: 800 milliseconds

Optimized Query:

+------+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
|    1 | SIMPLE      | order | range | order_id      | order_id | 67      | NULL | 999  | Using index condition |
+------+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+

Execution time: 10 milliseconds (f=80!)

See also: MDEV-26729

Non normalized join column

CREATE TABLE t1 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, data VARCHAR(64)
, ts TIMESTAMP
);

INSERT INTO t1 SELECT NULL, 'Some dummy data', NOW();
INSERT INTO t1 SELECT NULL, 'Some dummy data', NOW() FROM t1;
-- Up to 128k rows

CREATE TABLE t2 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, fld VARCHAR(64)
, KEY (fld)
);

INSERT INTO t2 SELECT NULL, ROUND(RAND()*100000, 0);
INSERT INTO t2 SELECT NULL, ROUND(RAND()*100000, 0) FROM t2;
-- Up to 512k rows

EXPLAIN
SELECT *
  FROM t1
  JOIN t2 ON SUBSTR(t2.fld, 1, 8) = t1.id
 WHERE t1.id BETWEEN 100 and 1000
;
+------+-------------+-------+--------+---------------+---------+---------+------+---------+-------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref  | rows    | Extra       |
+------+-------------+-------+--------+---------------+---------+---------+------+---------+-------------+
|    1 | SIMPLE      | t2    | index  | NULL          | fld     | 67      | NULL | 1046834 | Using index |
|    1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | func | 1       | Using where |
+------+-------------+-------+--------+---------------+---------+---------+------+---------+-------------+

Execution time: 450 ms

+------+-------------+-------+-------+---------------+---------+---------+------------+------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref        | rows | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------------+------+-------------+
|    1 | SIMPLE      | t1    | range | PRIMARY       | PRIMARY | 4       | NULL       | 448  | Using where |
|    1 | SIMPLE      | t2    | ref   | vnum          | vnum    | 5       | test.t1.id | 5    |             |
+------+-------------+-------+-------+---------------+---------+---------+------------+------+-------------+

Execution time: 23 ms

Optimizer is not considering ICP

CREATE TABLE test ...
INSERT INTO test SELECT NULL, 'Some data which are not relevant', NOW();
INSERT INTO test SELECT NULL, 'Some data which are not relevant', NOW() FROM test;
-- Repeat this step until you have 1 Mio rows
ALTER TABLE test ADD INDEX (ts);

EXPLAIN
SELECT * FROM test WHERE ts = '2024-03-13 10:35:23';
+------+-------------+-------+------+---------------+------+---------+-------+------+-------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+-------+------+-------+
|    1 | SIMPLE      | test  | ref  | ts            | ts   | 5       | const | 1000 |       |
+------+-------------+-------+------+---------------+------+---------+-------+------+-------+

EXPLAIN
SELECT * FROM test WHERE ts = CAST('2024-03-13 10:35:23' AS DATETIME);
+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                 |
+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
|    1 | SIMPLE      | test  | ref  | ts            | ts   | 5       | const | 1000 | Using index condition |
+------+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+

The second case should be faster but is in simple cases slightly slower. Should have a significant impact if data are located on disk.

OR optimization with Materialized View

CREATE TABLE `t1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data1` varchar(128) DEFAULT NULL,
  `data2` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO t1 SELECT NULL, 'Some data which are not relevant', NOW();
INSERT INTO t1 SELECT NULL, 'Some data which are not relevant', NOW() FROM t1;
...

UPDATE t1 SET data1 = MD5(id);
UPDATE t1 SET data2 = MD5(data1);

ALTER TABLE t1 ADD INDEX(data1), ADD INDEX (data2);

EXPLAIN
SELECT * FROM t1
 WHERE data1 = 'c4ca4238a0b923820dcc509a6f75849b'
    OR data2 = '8a6dbf554746d73e62de6e21509b6ee2'
;
+------+-------------+-------+-------------+---------------+-------------+---------+------+------+---------------------------------------+
| id   | select_type | table | type        | possible_keys | key         | key_len | ref  | rows | Extra                                 |
+------+-------------+-------+-------------+---------------+-------------+---------+------+------+---------------------------------------+
|    1 | SIMPLE      | t1    | index_merge | data1,data2   | data1,data2 | 131,131 | NULL | 2    | Using union(data1,data2); Using where |
+------+-------------+-------+-------------+---------------+-------------+---------+------+------+---------------------------------------+

Pretty fast, possibly faster if index on (data1, data2). Now the real problem:

CREATE table t2 LIKE t1;
ALTER TABLE t2 ADD COLUMN t1_id INT UNSIGNED NOT NULL;
INSERT INTO t2 SELECT id, data1, data2, id FROM t1;
ALTER TABLE t2 ADD INDEX (t1_id);

EXPLAIN
SELECT t1.id, t1.data1, t2.data2
  FROM t1
  JOIN t2 ON t1.id = t2.t1_id
 WHERE t1.data1 = 'c4ca4238a0b923820dcc509a6f75849b'
    OR t2.data2 = '8a6dbf554746d73e62de6e21509b6ee2'
;
+------+-------------+-------+--------+---------------+---------+---------+---------------+---------+-------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref           | rows    | Extra       |
+------+-------------+-------+--------+---------------+---------+---------+---------------+---------+-------------+
|    1 | SIMPLE      | t2    | ALL    | data2,t1_id   | NULL    | NULL    | NULL          | 1042503 |             |
|    1 | SIMPLE      | t1    | eq_ref | PRIMARY,data1 | PRIMARY | 4       | test.t2.t1_id | 1       | Using where |
+------+-------------+-------+--------+---------------+---------+---------+---------------+---------+-------------+

Horribly slow! Run time: 3.0 sec because of full table scan on table t2. Optimizer cannot do index_merge operation on 2 different tables.

Solution 1:

EXPLAIN
SELECT t1.id, t1.data1, t2.data2
  FROM t1
  JOIN t2 ON t1.id = t2.t1_id
 WHERE t1.data1 = 'c4ca4238a0b923820dcc509a6f75849b'
UNION
SELECT t1.id, t1.data1, t2.data2
  FROM t1
  JOIN t2 ON t1.id = t2.t1_id
 WHERE t2.data2 = '8a6dbf554746d73e62de6e21509b6ee2'
;
+------+--------------+------------+--------+---------------+---------+---------+---------------+------+--------------------------+
| id   | select_type  | table      | type   | possible_keys | key     | key_len | ref           | rows | Extra                    |
+------+--------------+------------+--------+---------------+---------+---------+---------------+------+--------------------------+
|    1 | PRIMARY      | t1         | ref    | PRIMARY,data1 | data1   | 131     | const         | 1    | Using where; Using index |
|    1 | PRIMARY      | t2         | ref    | t1_id         | t1_id   | 4       | test.t1.id    | 1    |                          |
|    2 | UNION        | t2         | ref    | data2,t1_id   | data2   | 131     | const         | 1    | Using index condition    |
|    2 | UNION        | t1         | eq_ref | PRIMARY       | PRIMARY | 4       | test.t2.t1_id | 1    |                          |
| NULL | UNION RESULT | <union1,2> | ALL    | NULL          | NULL    | NULL    | NULL          | NULL |                          |
+------+--------------+------------+--------+---------------+---------+---------+---------------+------+--------------------------+

Much faster. Run time: 0.001 sec. If this cannot be done because of application reasons try solution 2:

CREATE TABLE t1t2mv AS SELECT t1.id, t1.data1, t2.data2 FROM t1 JOIN t2 ON t1.id = t2.t1_id;
ALTER TABLE t1t2mv ADD PRIMARY KEY (id), ADD INDEX (data1), ADD INDEX (data2);

EXPLAIN
SELECT id, data1, data2
  FROM t1t2mv
 WHERE data1 = 'c4ca4238a0b923820dcc509a6f75849b'
    OR data2 = '8a6dbf554746d73e62de6e21509b6ee2'
;
+------+-------------+--------+-------------+---------------+-------------+---------+------+------+---------------------------------------+
| id   | select_type | table  | type        | possible_keys | key         | key_len | ref  | rows | Extra                                 |
+------+-------------+--------+-------------+---------------+-------------+---------+------+------+---------------------------------------+
|    1 | SIMPLE      | t1t2mv | index_merge | data1,data2   | data1,data2 | 131,131 | NULL | 2    | Using union(data1,data2); Using where |
+------+-------------+--------+-------------+---------------+-------------+---------+------+------+---------------------------------------+

Run time: 0.004 sec.

See also: