You are here

Databases are standardized but in detail they behave different

For a fancy application we want to query a chunk of rows from a table and therefore we need the minimum and the maximum of the Primary Key of these rows.
Because InnoDB is an Index Organized Table or Index Clustered Table we know that this access will use the Primary Key. But to be sure and to be compliant with the standard (and compatible) we use and ORDER BY on the Primary Key.

MySQL 5.7

First we create some test data:

mysql> CREATE TABLE t_my (
  ID CHAR(32) NOT NULL PRIMARY KEY
) ENGINE = InnoDB;

mysql> INSERT INTO t_my
SELECT MD5(RAND())
FROM t_my;
... create more than 10 rows

mysql> SELECT id FROM t_my ORDER BY id LIMIT 11;
+----------------------------------+
| id                               |
+----------------------------------+
| 01a6e76643c83c91867636ce90a8def5 |
| 0ea1b1670343b4e70dd449207c720957 |
| 141ec92e809c1d6af83d27e8a3e74fe7 |
| 1605890e2c0244b019e6f66cc94790f2 |
| 19826d67b6013ed3bc1105b9708959c4 |
| 1a9ffd320187831df939d596c9a50aa1 |
| 24ae3a883803f5ae8416754593cd881c |
| 27e614f1b4490a6db1b26364e467d361 |
| 285e3d84b81d97a40d66049d2f30071f |
| 2db85e2f2639d637ee21888ca34334d7 |
| 2f0e944ca977826730c352a1920cda1f |
+----------------------------------+
11 rows in set (0.00 sec)

Now comes the interesting part: We want the minimum and the maximum of the first chunk:

mysql> SELECT MIN(id), MAX(id) FROM t_my ORDER BY id LIMIT 10;
+----------------------------------+----------------------------------+
| MIN(id)                          | MAX(id)                          |
+----------------------------------+----------------------------------+
| 01a6e76643c83c91867636ce90a8def5 | f685b7269d76f47e7517cdd5fc4253bf |
+----------------------------------+----------------------------------+

And this is completely not expected (aka wrong?)! Instead of the highest Primary Key of the chunk MySQL returns the highest Primary Key of the whole table:

mysql> SELECT MIN(id), MAX(id) FROM t_my;
+----------------------------------+----------------------------------+
| MIN(id)                          | MAX(id)                          |
+----------------------------------+----------------------------------+
| 01a6e76643c83c91867636ce90a8def5 | f685b7269d76f47e7517cdd5fc4253bf |
+----------------------------------+----------------------------------+

If we look at how this is executed we see that the MySQL Optimizer took some kind of short cut:

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+

And from this result it does a LIMIT 10 on just one tuple which ends up in the result above. If this is a bug or not I cannot say. But it is at least not what I want.

Now we want to know how other RDBMS are dealing with this problem:

PostgreSQL 11

shell> su - postgres
shell> psql

postgres=# CREATE TABLE t_pg (
  id char(32) PRIMARY KEY
);

postgres=# INSERT INTO t_pg
SELECT MD5(RANDOM()::TEXT)
FROM t_pg;
... create more that 10 rows

postgres=# SELECT id FROM t_pg LIMIT 11;
                id                
----------------------------------
 9433b59ec7e14b1232743b3bdcc745a0
 56c59c2ce35e79b8f4141160b6dbcb69
 dbfe35456b12b741c7e20a973a65fcac
 f5bbd52a92c7c0f63b5bdf14e0b1b020
 e9b11d9243c701155f43506f7da95076
 aba50026e35562d867398ddb5e1ffc37
 586b98cfb8d7b19bf09f32e611298be5
 3fd768fda852972d096a015be675233c
 8c33a72edf0479298093b83a2d53ad59
 98dfbe2979df25d8169747ee15bced07
 5f7594d8b9de2694b4438d62579d658d
(11 rows)

Now comes the interesting part:

postgres=# SELECT MIN(id), MAX(id) FROM t_pg ORDER BY id LIMIT 10;
ERROR:  column "t_pg.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT MIN(id), MAX(id) FROM t_pg ORDER BY id LIMIT 10;
                                                   ^

PostgreSQL does NOT even allow this query. And complains about the id in the ORDER BY clause. If we write the query "correctly" we get a completely different result. Which is not usable for us:

postgres=# SELECT MIN(id), MAX(id) FROM t_pg GROUP BY id ORDER BY id LIMIT 10;
               min                |               max                
----------------------------------+----------------------------------
 03581a898bcedb0fb2bbb842be2fdaf5 | 03581a898bcedb0fb2bbb842be2fdaf5
 277ea3f40d4431cb9f41ac37848605f0 | 277ea3f40d4431cb9f41ac37848605f0
 3d007edf4cb9b9ffed10d74ef30f6a4b | 3d007edf4cb9b9ffed10d74ef30f6a4b
 3fd768fda852972d096a015be675233c | 3fd768fda852972d096a015be675233c
 56c59c2ce35e79b8f4141160b6dbcb69 | 56c59c2ce35e79b8f4141160b6dbcb69
 586b98cfb8d7b19bf09f32e611298be5 | 586b98cfb8d7b19bf09f32e611298be5
 591ab5d306827cc7a8a3f5d9ee780edc | 591ab5d306827cc7a8a3f5d9ee780edc
 5c33d18b907638956469d54630307b9d | 5c33d18b907638956469d54630307b9d
 5f7594d8b9de2694b4438d62579d658d | 5f7594d8b9de2694b4438d62579d658d
 6245ee76fbbe48d99b359deda7e38c0a | 6245ee76fbbe48d99b359deda7e38c0a
(10 rows)

If we do it this way, which is not what we want, we get, similar to MySQL the "wrong" result.

postgres=# SELECT MIN(id), MAX(id) FROM t_pg LIMIT 10;
               min                |               max                
----------------------------------+----------------------------------
 03581a898bcedb0fb2bbb842be2fdaf5 | f5bbd52a92c7c0f63b5bdf14e0b1b020

This result is the same like the maximum Primary Key:

postgres=# SELECT MIN(id), MAX(id) FROM t_pg;
               min                |               max                
----------------------------------+----------------------------------
 03581a898bcedb0fb2bbb842be2fdaf5 | f5bbd52a92c7c0f63b5bdf14e0b1b020

If PostgreSQL does it like this, it at least according to the standards, I hope. But NOT what I want.

SQLite 3

The third candidate we have chosen is SQLite3:

sqlite> CREATE TABLE t_sl (
  id TEXT PRIMARY KEY
);

Here was the challenging part to make the MD5() function work. Luckily we found a nearly perfect solution on StackOverflow.

sqlite> INSERT INTO t_sl
SELECT HEX(MD5(RANDOM()))
FROM t_sl;
... create more that 10 rows

sqlite> SELECT id FROM t_sl LIMIT 11;
467FE9B4EC744D1B4C21C1405936E863
F7A2E0BF53EA5243A734A1FAACCD1D28
3DE4FC5680C9F2E3AB9E7EA4BE7F6D69
0878C0298916B1FBFE7808263CA1703D
56332C0BC2EBCB3D960167CF475B9581
9F8661DE560EF8040B205A58224A2251
5F8A7807F56E604DC8BB595FE0F579B4
A015A9539F3966930F17EE4B545271F6
714E544157E871CE826E5923F84AA096
5FD9F4F71739AB75BD60B94F303973AA
96113F12CDDFEC20E98BA621783E0A6C

Same results as MySQL and PostgreSQL and still not what I want:

sqlite> SELECT MIN(id), MAX(id) FROM t_sl ORDER BY id LIMIT 10;
0249461D8D3516D513F18DE3BC4CE677|FB75ECCB85BDDA88E5B8D48CF056B1CC

sqlite> SELECT MIN(id), MAX(id) FROM t_sl;
0249461D8D3516D513F18DE3BC4CE677|FB75ECCB85BDDA88E5B8D48CF056B1CC

Oracle 18

Now let us have a look how Oracle does it:

sqlplus> CREATE TABLE t_ora (
id CHAR(32) NOT NULL PRIMARY KEY
);

sqlplus> INSERT INTO t_ora SELECT STANDARD_HASH(dbms_random.random, 'MD5') FROM dual;
... create more that 10 rows

sqlplus> SELECT DISTINCT id FROM t_ora;
264270A59D9EE04668C8F298DF3DF184
CCE8CA725CD633FC1AC2C73C32F0EAF4
9B4C28001530BA8FA8F682597576B88C
215D03A9E409D99C6EB9EAD11CD722CA
770F99E6D2A4929DEE4D54214D1C99E4
39A48517FB5B58403C85317F02AFC167
7E2D9C597602637634C7164811C3FA15
1826607210B081381254F7D1D061B25E
6984D425379806E16AA81424438003BA
EBBCAE0E0B263A223D70E94D1020CCE9
7E854FFBAA1EAB1D7E18ADB085975C40
06215B1756DA5926D27BDB0BC47DDEA9
D3CC548842BD1F978326CAED25518533
E80B9F42585E42F1AAE3726EC49FEE7F
62FA08D4EF65DEBEC08F219C1DC4F583
7ADECA10EACA57F9D75AD1CDB4E7C965

Here we get completely confusing results with ROWNUM:

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 10;
MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
1826607210B081381254F7D1D061B25E EBBCAE0E0B263A223D70E94D1020CCE9

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 5;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
06FDD744640D35C1E527AB05B8379349 C5EA7113397CF161A951ECBB80E1DFEF

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 6;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
06FDD744640D35C1E527AB05B8379349 E6692DFBF57EB110CAE2169A4204C37B

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 7;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
06FDD744640D35C1E527AB05B8379349 E6692DFBF57EB110CAE2169A4204C37B

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 8;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
06FDD744640D35C1E527AB05B8379349 E6692DFBF57EB110CAE2169A4204C37B

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 9;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
06FDD744640D35C1E527AB05B8379349 EF34D8DBC0832BFD813939FB1840804D

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 2 order by id ;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
B54068FF7AC56D6D8200F4E44410DCC6 C5EA7113397CF161A951ECBB80E1DFEF

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 3 order by id ;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
5D7FB42D47FB5F6F220B3872B48AC8ED C5EA7113397CF161A951ECBB80E1DFEF

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora WHERE ROWNUM <= 4 order by id;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
1CF93D49926CC3D9F83FC639B423A7F7 C5EA7113397CF161A951ECBB80E1DFEF

This is explainable according to my contact because Oracle does not guarantee ROWNUM and it is a new set of data (ROWNUM Pseudocolumn).

But rewriting the query would at least give us the right result.

sqlplus> SELECT MIN(id), MAX(id) FROM (SELECT id FROM t_ora ORDER BY id) WHERE ROWNUM <= 5 ;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
06FDD744640D35C1E527AB05B8379349 32ABDC3657FAE00B8EEE6EB2C42C12F1

sqlplus> SELECT MIN(id), MAX(id) FROM (SELECT id FROM t_ora ORDER BY id) WHERE ROWNUM <= 6 ;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
06FDD744640D35C1E527AB05B8379349 38FC2D4348D7A44C834D008F5B7BBD5E

I am pretty sure that this Query Execution Plan on Oracle would also look pretty bad and will not perform on huge data sets as we plan to have for our fancy application.

And in this case Oracle behaves like the other databases:

sqlplus> SELECT MIN(id), MAX(id) FROM t_ora FETCH FIRST 10 ROWS ONLY ;

MIN(ID)                          MAX(ID)
-------------------------------- --------------------------------
06FDD744640D35C1E527AB05B8379349 F5D8266D3646C57F6AB3E6001D305A2C

Thanks to Markus R. for assistance on Oracle!

SQL Server 2019

And finally the last candidate to test is Microsoft SQL Server 2019 on Ubuntu 18.04:

shell> sqlcmd -S localhost -U SA

mssql> CREATE TABLE t_ms (
  ID CHAR(32) NOT NULL PRIMARY KEY
)
GO

mssql> INSERT INTO t_ms
SELECT CONVERT(VARCHAR(32), HashBytes('MD5', STR(RAND(), 25, 20)), 2)
GO
... create more than 10 rows

mssql> SELECT id FROM t_ms ORDER BY id OFFSET 0 ROWS FETCH NEXT 11 ROWS ONLY
GO
id                              
--------------------------------
3203A25102554923CA11BD80C99D2728
3A0D79AA2466AE0EA580295FD5C81145
3DFE9A0A1FDD2654C6BBB24680D13B15
6A5CF25DDFE0278674EE98E02B5C4B38
8CAC11376C31E22E4AEF8214F31AA36B
96C5362832286577A3FB72A840855DFA
A0CA369CAAE540A2A3E92317DC5B939F
A5543C6D1244357CA89DD8B16A85E9EF
B98148167FC627C0EDF632C981E9296B
C50B5BC636BC29D2EDB13C5C1749F7D9
E633FD894FBFD6CAA20C7C4182D8EEBC

(11 rows affected)

Similar error like PostgreSQL:

mssql> SELECT MIN(id), MAX(id) FROM t_ms ORDER BY id OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
GO
Msg 8127, Level 16, State 1, Server ubuntu1804, Line 10
Column "t_ms.ID" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.

And unusable results as with the other RDBMS:

mssql> SELECT MIN(id), MAX(id) FROM t_ms GROUP BY id ORDER BY id OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
GO
                                                                 
-------------------------------- --------------------------------
3203A25102554923CA11BD80C99D2728 3203A25102554923CA11BD80C99D2728
3A0D79AA2466AE0EA580295FD5C81145 3A0D79AA2466AE0EA580295FD5C81145
3DFE9A0A1FDD2654C6BBB24680D13B15 3DFE9A0A1FDD2654C6BBB24680D13B15
6A5CF25DDFE0278674EE98E02B5C4B38 6A5CF25DDFE0278674EE98E02B5C4B38
8CAC11376C31E22E4AEF8214F31AA36B 8CAC11376C31E22E4AEF8214F31AA36B
96C5362832286577A3FB72A840855DFA 96C5362832286577A3FB72A840855DFA
A0CA369CAAE540A2A3E92317DC5B939F A0CA369CAAE540A2A3E92317DC5B939F
A5543C6D1244357CA89DD8B16A85E9EF A5543C6D1244357CA89DD8B16A85E9EF
B98148167FC627C0EDF632C981E9296B B98148167FC627C0EDF632C981E9296B
C50B5BC636BC29D2EDB13C5C1749F7D9 C50B5BC636BC29D2EDB13C5C1749F7D9

(10 rows affected)

mssql> SELECT MIN(id), MAX(id) FROM t_ms
GO
                                                                 
-------------------------------- --------------------------------
3203A25102554923CA11BD80C99D2728 F8FFC36AB5AE80748CE6248EE9C4ACD8

(1 rows affected)

Microsoft SQL Server seems to behave similar to PostgreSQL. And IMHO Microsoft SQL Server is really unhandy on the CLI.

Some help to make MS SQL Server work:


I do not know why it did not accept the password in the first run.

Solution

The "correct" (aka wanted) result we get like this which is a similar solution as Markus proposed above:

mysql> SELECT MIN(id), MAX(id) FROM (SELECT id FROM t_my ORDER BY id LIMIT 10) AS x;
+----------------------------------+----------------------------------+
| MIN(id)                          | MAX(id)                          |
+----------------------------------+----------------------------------+
| 01a6e76643c83c91867636ce90a8def5 | 2db85e2f2639d637ee21888ca34334d7 |
+----------------------------------+----------------------------------+

The MySQL Query Execution Plan does not really look too cool but is acceptable:

+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL   | NULL          | NULL    | NULL    | NULL |   10 |   100.00 | NULL        |
|  2 | DERIVED     | t_my       | NULL       | index | NULL          | PRIMARY | 32      | NULL |   10 |   100.00 | Using index |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

Final thoughts

Linux (Debian and Ubuntu) is really cool because we were capable to test 4 of the these 5 different database products within a very short time (2 hours from scratch install). There is just one product which causes a bit more headache. But for this product one has friends...

And finally we found a completely different and much more generic approach for the problem to solve in our fancy application...

Taxonomy upgrade extras: 

Comments

Some of my colleagues mentioned Window Functions. Maybe it works. But I fear that window functions are not fast because they do some materialization in between? I have to test...

Shinguzcomment