You are here

Impact of column types on MySQL JOIN performance

In our MySQL trainings and consulting engagements we tell our customers always to use the smallest possible data type to get better query performance. Especially for the JOIN columns. This advice is supported as well by the MySQL documentation in the chapter Optimizing Data Types:

Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.

I remember somewhere the JOIN columns where explicitly mentioned but I cannot find it any more.

Test set-up

To get numbers we have created a little test set-up:

CREATE TABLE `a` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT
, `data` varchar(64) DEFAULT NULL
, `ts` timestamp NOT NULL
, PRIMARY KEY (`id`)
) ENGINE=InnoDB CHARSET=latin1
 

CREATE TABLE `b` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT
, `data` varchar(64) DEFAULT NULL
, `ts` timestamp NOT NULL
, `a_id` int(10) unsigned DEFAULT NULL
, PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

1048576 rows 16777216 rows
The following query was used for the test:
EXPLAIN SELECT * FROM a JOIN b ON b.a_id = a.id WHERE a.id BETWEEN 10000 AND 15000;
+----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref         | rows     | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+
|  1 | SIMPLE      | b     | ALL    | NULL          | NULL    | NULL    | NULL        | 16322446 | Using where |
|  1 | SIMPLE      | a     | eq_ref | PRIMARY       | PRIMARY | 4       | test.b.a_id |        1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+

And yes: I know this query could be more optimal by setting an index on b.a_id.

Results

The whole workload was executed completely in memory and thus CPU bound (we did not want to measure the speed of our I/O system).

SEJOIN columnbytesquery timeGainSpaceCharacter set
InnoDBMEDIUMINT35.28 s96%4% faster75%
InnoDBINT45.48 s100%100%100%
InnoDBBIGINT85.65 s107%7% slower200%
InnoDBNUMERIC(7, 2)~46.77 s124%24% slower~100%
InnoDBVARCHAR(7)7-86.44 s118%18% slower~200%latin1
InnoDBVARCHAR(16)7-86.44 s118%18% slower~200%latin1
InnoDBVARCHAR(32)7-86.42 s118%18% slower~200%latin1
InnoDBVARCHAR(128)7-86.46 s118%18% slower~200%latin1
InnoDBVARCHAR(256)8-96.17 s114%14% slower~225%latin1
InnoDBVARCHAR(16)7-86.96 s127%27% slower~200%utf8
InnoDBVARCHAR(128)7-86.82 s124%24% slower~200%utf8
InnoDBCHAR(16)166.85 s125%25% slower400%latin1
InnoDBCHAR(128)1289.68 s177%77% slower3200%latin1
InnoDBTEXT8-910.7 s195%95% slower~225%latin1
MyISAMINT43.16 s58%42% faster
TokuDBINT44.52 s82%18% faster

Some comments to the tests:

  • MySQL 5.6.13 was used for most of the tests.
  • TokuDB v7.1.0 was tested with MySQL 5.5.30.
  • As results the optimistic cases were taken. In reality the results can be slightly worse.
  • We did not take into consideration that bigger data types will eventually cause more I/O which is very slow!

Commands

ALTER TABLE a CONVERT TO CHARACTER SET latin1;
ALTER TABLE b CONVERT TO CHARACTER SET latin1;

ALTER TABLE a MODIFY COLUMN id INT UNSIGNED NOT NULL;
ALTER TABLE b MODIFY COLUMN a_id INT UNSIGNED NOT NULL;

Taxonomy upgrade extras: 

Comments

Hi! It's an interesting (and well designed) test case. In single threaded workloads 5.6 can be a little slower than 5.5, so it may have advantaged TokuDB just a little bit. I would be interested to see if you are able to re-run just INT performance across MySQL 5.5 / 5.6 / 5.7.
morgocomment

Hi Morgan,

Thanks!

I have planed such a test already a while ago but did not find the time to do it yet...

In my mind I have some numbers showing that since MySQL 4.0 single query performance has decreased significantly...

This leads to the question: Where are all the MySQL branches and forks heading to and is this in the interest of the majority of the MySQL users or just in the interest of a few ones like Facebook, LinkeId, Google, Booking.com etc...?

Regards,
Oli

Shinguzcomment

Hi Oli,

thanks for the nice comparison chart !
In InnoDB, isn't (Tiny|Small|Medium)-int internally aligned to 4 bytes, and thus stored as 4bytes int ?
How many iterations did you run the query ?

Thanks in advance!
/Joffrey

joffreycomment

Hi Joffrey,

The easy answer first: 5 - 15 iterations until it got a stable response time and then I took the most optimistic value. Not very scientific, I know... But I think good enough for a reliable statement.

About InnoDB 4 byte alignment. No, it is not.

Some proves beside consulting various documentations from InnoDB table monitor:

CREATE TABLE `align` (
  `id` int(11) NOT NULL DEFAULT '0',
  `tiny` tinyint(4) DEFAULT NULL,
  `small` smallint(6) DEFAULT NULL,
  `medium` mediumint(9) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

+----+------+-------+--------+
| id | tiny | small | medium |
+----+------+-------+--------+
|  1 |    1 |     1 |      1 |
|  2 |    2 |     2 |      2 |
|  3 |    3 |     3 |      3 |
+----+------+-------+--------+

TABLE: name test/align, id 97, flags 1, columns 7, indexes 1, appr.rows 3

COLUMNS: id: DATA_INT DATA_BINARY_TYPE DATA_NOT_NULL len 4;
         tiny: DATA_INT DATA_BINARY_TYPE len 1;
         small: DATA_INT DATA_BINARY_TYPE len 2;
         medium: DATA_INT DATA_BINARY_TYPE len 3;
         DB_ROW_ID: DATA_SYS prtype 256 len 6;
         DB_TRX_ID: DATA_SYS prtype 257 len 6;
         DB_ROLL_PTR: DATA_SYS prtype 258 len 7;

INDEX: name PRIMARY, id 175, fields 1/6, uniq 1, type 3
       root page 3, appr.key vals 3, leaf pages 1, size pages 1

FIELDS:  id DB_TRX_ID DB_ROLL_PTR tiny small medium

And from any hexdump utility:

       00          04          08          0c
0x0000 e2 ad 38 b2 00 00 00 03 ff ff ff ff ff ff ff ff  ..8.............
0x0010 00 00 00 05 52 19 39 78 45 bf 00 00 00 00 00 00  ....R.9xE.......
0x0020 00 00 00 00 00 53 00 02 00 cf 80 05 00 00 00 00  .....S..........
0x0030 00 b8 00 02 00 02 00 03 00 00 00 00 00 00 00 00  ................
0x0040 00 00 00 00 00 00 00 00 00 af 00 00 00 53 00 00  .............S..
0x0050 00 02 00 f2 00 00 00 53 00 00 00 02 00 32 01 00  .......S.....2..
0x0060 02 00 1b 69 6e 66 69 6d 75 6d 00 04 00 0b 00 00  ...infimum......
0x0070 73 75 70 72 65 6d 75 6d 00 00 00 10 00 1d 80 00  supremum........
0x0080 00 01 00 00 00 00 ba 0c 8c 00 00 01 8e 01 10 81  ................
0x0090 80 01 80 00 01 00 00 00 18 00 1d 80 00 00 02 00  ................
0x00a0 00 00 00 ba 0c 8c 00 00 01 8e 01 1c 82 80 02 80  ................
0x00b0 00 02 00 00 00 20 ff b8 80 00 00 03 00 00 00 00  ..... ..........
0x00c0 ba 0c 8c 00 00 01 8e 01 28 83 80 03 80 00 03 00  ........(.......
0x00d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
olicomment

Hi Joffrey,

I think NDB tables use 4-byte alignment not InnoDB:

NDB tables use 4-byte alignment; all NDB data storage is done in multiples of 4 bytes. Thus, a column value that would typically take 15 bytes requires 16 bytes in an NDB table. For example, in NDB tables, the TINYINT, SMALLINT, MEDIUMINT, and INTEGER (INT) column types each require 4 bytes storage per record due to the alignment factor.

Check this manual page for reference.

Thanks,
Abdel-Mawla

abdel-mawlacomment