You are here

To UNION or not to UNION...

Recently a forum question [ 1 ] got my attention:

Is there any performance issue with Union?

I used union all sometime back and it was performance issue just to make an opinion that we should used union in query.

The question itself was not too interesting because the answer is easy: It depends. But I wanted to see if there was an improvement in this common problem over time in MySQL.

Test set-up

So I prepared a little test to simulate some of the possible scenarios:

CREATE TABLE `u` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `a` int(10) unsigned DEFAULT NULL,
  `b` int(10) unsigned DEFAULT NULL,
  `c` int(10) unsigned DEFAULT NULL,
  `d` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`),
  KEY `b` (`b`),
  KEY `c` (`c`),
  KEY `d` (`d`)
) ENGINE=InnoDB
;

INSERT INTO u SELECT NULL, ROUND(RAND()*10, 0), ROUND(RAND()*10, 0), ROUND(RAND()*1000000, 0), ROUND(RAND()*1000000, 0);
INSERT INTO u SELECT NULL, ROUND(RAND()*10, 0), ROUND(RAND()*10, 0), ROUND(RAND()*1000000, 0), ROUND(RAND()*1000000, 0) FROM u;
... 1 mio rows

ANALYZE TABLE u;

With this table we can simulate the OR problem with low and high selectivity.

Running the tests

We did the tests with MySQL (5.0 - 5.7), Percona Server (5.6) and MariaDB (5.5, 10.0) for the following queries:

EXPLAIN SELECT * FROM u WHERE a = 5 OR b = 5;
EXPLAIN SELECT * FROM u WHERE a = 5 OR c = 500001;
EXPLAIN SELECT * FROM u WHERE c = 500001 OR d = 500001;

We are interested in what the optimizer is doing and what the performance of the queries is. The following results came out:

Query 1 Query 2 Query 3
Database version rows avg. time QEP rows avg. time QEP rows avg. time QEP
MySQL 5.0.92 194402 390 ms 1 104876 230 ms 2 6 < 10 ms 3
MySQL 5.1.66 194402 410 ms 1 104876 240 ms 2 6 < 10 ms 3
MySQL 5.5.24 194402 420 ms 1 104876 370 ms 1 6 < 10 ms 3
MariaDB 5.5.32 194402 460 ms 1 104876 420 ms 1 6 < 10 ms 3
MySQL 5.6.12 194402 440 ms 2 104876 240 ms 2 6 < 10 ms 3
Percona 5.6.12-60.40 194402 450 ms 2 104876 240 ms 2 6 < 10 ms 3
MySQL 5.7.1 194402 420 ms 2 104876 220 ms 2 6 < 10 ms 3
MariaDB 10.0.3 194402 450 ms 1 104876 400 ms 1 6 < 10 ms 3

Different Query Execution Plans (QEP)

  • QEP 1:
    +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
    | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
    +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
    |  1 | SIMPLE      | u     | ALL  | a,b           | NULL | NULL    | NULL | 1049134 | Using where | 
    +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

  • QEP 2:
    +----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
    | id | select_type | table | type        | possible_keys | key  | key_len | ref  | rows   | Extra                         |
    +----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
    |  1 | SIMPLE      | u     | index_merge | a,c           | a,c  | 5,5     | NULL | nnnnnn | Using union(a,c); Using where | 
    +----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+

  • QEP 3:
    +----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
    | id | select_type | table | type        | possible_keys | key  | key_len | ref  | rows | Extra                         |
    +----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
    |  1 | SIMPLE      | u     | index_merge | c,d           | c,d  | 5,5     | NULL |    n | Using union(c,d); Using where | 
    +----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+

  • Conclusion

    • Single query performance went down from 5 - 50% (in one case increased by 5%) over time (MySQL releases). But we can see some impacts on optimizer improvements.
    • Newer MySQL releases are not necessarily faster for single-query performance than older ones. Most of the MySQL users are not running more than 1 or 2 concurrent queries. For them scalability improvements are not really an issue.
    • There seems to be some changes in the Optimizer some for good, some for bad, depending on the release or branch/fork you are using. So test carefully when you change the release or branch/fork.
    • And: Do not believe the whole marketing yelling but do your own testing...