You are here

Transactional memory resident tables with PBXT

Introduction

In his presentation about PBXT at the DOAG Conference 2010 Paul McCullagh was speaking about memory resident PBXT tables. They will be available in version 1.1 of the PBXT Storage Engine Plugin. Memory resident PBXT tables should have similar characteristics like normal MySQL MEMORY tables. But in addition to the MEMORY tables they are transactional and can handle BLOB and TEXT attributes.

An alternative to this feature would be the MySQL dynamic heap row patches provided by eBay and hosted at code.google.com.

In the following blog article we go through the installation of the PBXT Storage Engine Plugin version 1.1 and how to use PBXT MEMORY resident tables.

Download

I have not found binaries yet. So we have to compile them ourself. Download is pretty simple:

shell> bzr branch lp:pbxt/1.1

Build

To build the pluggable PBXT Storage Engine we have to do the typical 3 steps: configure; make; make install. But there are some pitfalls to circumvent:

shell> ./configure --with-mysql=/home/mysql/source/mysql-5.1.53 \
--prefix=/home/mysql/product/mysql-5.1.53
shell> make

Pitfalls:

  • If you do NOT specify the --with-mysql option you will get the following error message:
    checking for mysql source code... configure: error: "no MySQL source
    found at /home/mysql/tmp"
  • If you did NOT run configure and make on those MySQL sources before you will get error messages as follows:
    checking for mysql source code... configure: error: "no Makefile found
    in source/mysql-5.1.53"
    include/my_global.h:80:23: error: my_config.h: No such file or directory
    include/my_global.h:815:2: error: #error "please add -DSTACK_DIRECTION=1
    or -1 to your CPPFLAGS"
    and compilation will fail. You have to do first at least a:
    shell> ./configure
    shell> make
    on the MySQL sources to be capable to compile the PBXT Storage Engine against them!
  • If you consider to use different configure options a second time, make sure you remove and download the sources from scratch again. Otherwise you may get results which are not expected:
    "prefix is not considered during make install"
    https://bugs.launchpad.net/bugs/677802>

Install

If you run:

shell> sudo make install

the files should be copied to their according locations. Then load the pluggable Storage Engine into MySQL:

mysql> INSTALL PLUGIN PBXT SONAME 'libpbxt.so';

If you get an error message complaining that MySQL cannot find the Storage Engine plugin you should set your plugin_dir accordingly in your my.cnf:

mysql> SHOW GLOBAL VARIABLES LIKE 'plugin_dir';
+---------------+---------------------------------------------------+
| Variable_name | Value                                             |
+---------------+---------------------------------------------------+
| plugin_dir    | /home/mysql/product/mysql-5.1.53/lib/mysql/plugin |
+---------------+---------------------------------------------------+

Verify

Then, after successful loading, you can see your PBXT Storage Engine:

mysql> SHOW PLUGINS;
+------------+----------+----------------+------------+---------+
| Name       | Status   | Type           | Library    | License |
+------------+----------+----------------+------------+---------+
...
| PBXT       | ACTIVE   | STORAGE ENGINE | libpbxt.so | GPL     |
+------------+----------+----------------+------------+---------+

With the following command you can see which version is currently loaded:

mysql> SHOW ENGINE PBXT STATUS\G
*************************** 1. row ***************************
  Type: PBXT
  Name:
Status:
101120 13:48:07 PBXT 1.1.01 RC STATUS OUTPUT
...

Or from the INFORMATION_SCHEMA, which gives IMHO the wrong information:

mysql> SELECT plugin_name, plugin_version, plugin_type_version
     , plugin_library_version
  FROM INFORMATION_SCHEMA.plugins
 WHERE plugin_name = 'PBXT';
+-------------+----------------+---------------------+------------------------+
| plugin_name | plugin_version | plugin_type_version | plugin_library_version |
+-------------+----------------+---------------------+------------------------+
| PBXT        | 0.1            | 50153.0             | 1.0                    |
+-------------+----------------+---------------------+------------------------+

Bug #677839: Plugin version is not correctly reported in information_schema

Tests

For the first simple tests we use the MyEnv test table:

mysql> source /home/mysql/myenv/sql/test_table.sql
mysql> alter table test engine = pbxt storage memory;

Unfortunately I have not found yet a way to see if a table is a PBXT MEMORY table or not. Neither with SHOW CREATE TABLE nor with with SHOW TABLE STATUS or SELECT * FROM INFORMATION_SCHEMA.tables;.

The only way I found was to look at the file system:

mysql> system ls -la
-rw-rw---- 1 mysql dba    4 2010-11-20 15:32 test-3.xtr
-rw-rw---- 1 mysql dba 8612 2010-11-20 15:32 test.frm
-rw-rw---- 1 mysql dba  155 2010-11-20 15:32 test.xtd
-rw-rw---- 1 mysql dba 4096 2010-11-20 15:32 test.xti

mysql> ALTER TABLE test ENGINE = PBXT STORAGE MEMORY;

mysql> system ls -la
-rw-rw---- 1 mysql dba 8612 2010-11-20 15:32 test.frm

An other possibility is:

mysql> system cat ./pbxt/tables
[table]
id=37
name=pbxt
location=./test
storage=disk
type=0
[table]
id=42
name=pbxt_memory_blob
location=./test
storage=heap
type=0

Playing around with the table I have NOT found a way to convert a PBXT MEMORY table back to a DISK table yet. Both did NOT work:

mysql> ALTER TABLE test ENGINE=PBXT;
mysql> ALTER TABLE test ENGINE=PBXT STORAGE DISK;

Memory usage of PBXT MEMORY tables

What first interested me with MEMORY tables was their actual memory consumption. MySQL MEMORY tables make fixed sized rows. So I was wondering how PBXT MEMORY table behave.

5 different tables with exactly the same content:

mysql> SHOW TABLE STATUS;
+------------------+--------+------------+---------+----------------+-------------+--------------+
| Name             | Engine | Row_format | Rows    | Avg_row_length | Data_length | Index_length |
+------------------+--------+------------+---------+----------------+-------------+--------------+
| memory           | MEMORY | Fixed      | 1048576 |             74 |    80069616 |      8549392 |
| pbxt             | PBXT   | Dynamic    | 1048576 |             88 |    92275712 |     12619776 |
| pbxt_memory      | PBXT   | Dynamic    | 1048576 |             88 |    92275712 |     12619776 |
| test             | MyISAM | Dynamic    | 1048576 |             20 |    20971520 |     10762240 |
| pbxt_memory_blob | PBXT   | Dynamic    | 1048576 |            527 |   552600576 |     12619776 |
+------------------+--------+------------+---------+----------------+-------------+--------------+

PBXT claims for its memory table to have a dynamic row format. But it seems that for a VARCHAR it allocates the full length: 88 byte (14-26 byte header + 4 byte INT UNSIGNED, 64 VARCHAR + 4 byte TIMESTAMP).

For the table with the LONGTEXT field it looks even much worse. For a LONGTEXT there is some additional memory allocated for what ever reason...

So MEMORY tables seem to work with PBXT but they allocate roughly 6 times more memory when we use BLOB/TEXT attributes compared to VARCHAR. Possibly some more investigation has to be done here...

To limit MySQL MEMORY tables in growth there is a parameter called max_heap_table_size. With this behaviour of PBXT MEMORY tables it would be desirable to have a similar parameter to restrict PBXT MEMORY tables. I suggest: pbxt_max_memory_table_size?

Transactions

To find if transactions work as expected we run some simple tests. In the first example we kill a DELETE statement from an other session:

mysql> SELECT COUNT(*) FROM pbxt_memory;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.73 sec)

mysql> DELETE FROM pbxt_memory;
ERROR 1317 (70100): Query execution was interrupted

mysql> SELECT COUNT(*) FROM pbxt_memory;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    9
Current database: test

+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.71 sec)

What we can see is that the count takes longer than with a MEMORY table. But this can be explained because PBXT has to span a transaction and really has to count the data. Where MEMORY tables know always instantaneous, because of their non-parallel write behaviour, how many rows a table contains.

In the next simple example we start a transaction, delete some rows and roll back:

mysql> START TRANSACTION;

mysql> DELETE FROM pbxt_memory;
Query OK, 1048576 rows affected (11.48 sec)

mysql> SELECT COUNT(*) FROM pbxt_memory;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT COUNT(*) FROM pbxt_memory;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (12.02 sec)

Interesting behaviour we can see when we look at the timing of the commands...

Indexing

For unique in memory data structures, hash indexes perform better than tree indexes. With MEMORY tables there are the HASH indexes for this purpose.

When we try the same thing with PBXT MEMORY tables it seems to work:

mysql> ALTER TABLE pbxt_memory ADD INDEX (ts) USING HASH;
Query OK, 1048576 rows affected (39.42 sec)
Records: 1048576  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE pbxt_memory\G
*************************** 1. row ***************************
       Table: pbxt_memory
Create Table: CREATE TABLE `pbxt_memory` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` varchar(64) DEFAULT NULL,
  `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `ts` (`ts`) USING HASH
) ENGINE=PBXT DEFAULT CHARSET=latin1
1 row in set (0.02 sec)

But Paul told me that PBXT does not know about HASH indexes. So I would expect at least a warning if not even an error message...

Partitioning with PBXT MEMORY tables

Just by accident I had the idea of creating a partitioned PBXT MEMORY table:

CREATE TABLE test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, data VARCHAR(64)
, ts TIMESTAMP
)
ENGINE = PBXT STORAGE MEMORY
PARTITION BY RANGE( id )
(
  PARTITION p0 VALUES LESS THAN (1000)
, PARTITION p1 VALUES LESS THAN (2000)
, PARTITION p2 VALUES LESS THAN (3000)
, PARTITION p3 VALUES LESS THAN (4000)
);

After filling some data in I had a look at the execution plan:

mysql> EXPLAIN PARTITIONS
SELECT MAX(ts)
  FROM test
 WHERE id BETWEEN 2000 AND 3999;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | test  | p2,p3      | range | PRIMARY       | PRIMARY | 4       | NULL |    2 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+-------------+

So it seems event this works. I am not sure where partitioned PBXT MEMORY tables would make sense. But who knows...

It would be interesting now to tell MySQL that it should use PBXT MEMORY tables instead of MySQL MEMORY/MYISAM tables...

AttachmentSize
pbxt_memory_test.sql7.51 KB
pbxt_memory_test_avg_row_len.sql6.77 KB

Comments

Normal PBXT makes guesses as to what rows should be stored fixed format and what varsized. It is possible that for memory tables, this guess is wrong. I forget the syntax to change it, but it is possible. perhaps these default guesses should be different for in memory tables too.
Stewart Smithcomment

Hi Stewart, Thanks for your comment. Paul made some similar conclusions: The problem could be that the fixed length record component is too large. PBXT tries to estimate the average row length of a table when it is created. It then sets the fixed length record component to the estimated average row length value. If the estimate is not good, you can set it explicitly by setting AVG_ROW_LENGTH. Optimizing the average row length is explained in my talk. For the moment you can just try AVG_ROW_LENGTH = 12. This will place all the data in the extended record component, which optimizes memory usage. So I have to dive into this again and do some more investigations. Cheers, Oli
olicomment

We dug some further with this problem of too big rows. The results you can find below.
NameEngineRow_formatRowsAvg_row_lengthData_lengthIndex_lengthCreate_options
t_innodb_varcharInnoDBCompact52483337194478087880704
t_innodb_varcharlongInnoDBCompact52483337194478087880704
t_innodb_charInnoDBCompact52440399519864327880704
t_innodb_tinytextInnoDBCompact52483337194478087880704
t_innodb_smalltextInnoDBCompact52483337194478087880704
t_innodb_mediumtextInnoDBCompact52483337194478087880704
t_innodb_longtextInnoDBCompact52483337194478087880704
t_memory_charMEMORYFixed52428873400345528548880
t_memory_varcharMEMORYFixed52428874400345528548880
t_myisam_varcharMyISAMDynamic524288201048576010741760
t_myisam_varcharlongMyISAMDynamic524288201048576010741760
t_myisam_charMyISAMFixed524288733827302410728448
t_myisam_tinytextMyISAMDynamic524288201048576010741760
t_myisam_smalltextMyISAMDynamic524288201048576010741760
t_myisam_mediumtextMyISAMDynamic524288241258291210741760
t_myisam_longtextMyISAMDynamic524288241258291210741760
t_pbxt_varcharPBXTDynamic524288884613836812652544
t_pbxt_varcharlongPBXTDynamic5242881538021708812652544
t_pbxt_charPBXTFixed524288874561408012652544
t_pbxt_tinytextPBXTDynamic524288884613836812652544
t_pbxt_smalltextPBXTDynamic52428852727630080012652544
t_pbxt_mediumtextPBXTDynamic52428852727630080012652544
t_pbxt_longtextPBXTDynamic52428852727630080012652544
t_pbxt_memory_varcharPBXTDynamic524288884613836812652544
t_pbxt_memory_varcharlongPBXTDynamic5242881538021708812652544
t_pbxt_memory_charPBXTFixed524288874561408012652544
t_pbxt_memory_tinytextPBXTDynamic524288884613836812652544
t_pbxt_memory_smalltextPBXTDynamic52428852727630080012652544
t_pbxt_memory_mediumtextPBXTDynamic52428852727630080012652544
t_pbxt_memory_longtextPBXTDynamic52428852727630080012652544
t_pbxt_varchar_arl_4PBXTDynamic524288261363251212652544avg_row_length=4
t_pbxt_varchar_arl_8PBXTDynamic524288261363251212652544avg_row_length=8
t_pbxt_varchar_arl_16PBXTDynamic524288301572966412652544avg_row_length=16
t_pbxt_varchar_arl_32PBXTDynamic524288462411827212652544avg_row_length=32
t_pbxt_varchar_arl_64PBXTDynamic524288784089548812652544avg_row_length=64
t_pbxt_varchar_arl_128PBXTDynamic524288884613836812652544avg_row_length=128
t_pbxt_varcharlong_arl_4PBXTDynamic524288261363251212652544avg_row_length=4
t_pbxt_varcharlong_arl_8PBXTDynamic524288261363251212652544avg_row_length=8
t_pbxt_varcharlong_arl_16PBXTDynamic524288301572966412652544avg_row_length=16
t_pbxt_varcharlong_arl_32PBXTDynamic524288462411827212652544avg_row_length=32
t_pbxt_varcharlong_arl_64PBXTDynamic524288784089548812652544avg_row_length=64
t_pbxt_varcharlong_arl_128PBXTDynamic5242881427444992012652544avg_row_length=128
t_pbxt_tinytext_arl_4PBXTDynamic524288261363251212652544avg_row_length=4
t_pbxt_tinytext_arl_8PBXTDynamic524288261363251212652544avg_row_length=8
t_pbxt_tinytext_arl_16PBXTDynamic524288301572966412652544avg_row_length=16
t_pbxt_tinytext_arl_32PBXTDynamic524288462411827212652544avg_row_length=32
t_pbxt_tinytext_arl_64PBXTDynamic524288784089548812652544avg_row_length=64
t_pbxt_tinytext_arl_128PBXTDynamic5242881427444992012652544avg_row_length=128
t_pbxt_longtext_arl_4PBXTDynamic524288261363251212652544avg_row_length=4
t_pbxt_longtext_arl_8PBXTDynamic524288261363251212652544avg_row_length=8
t_pbxt_longtext_arl_16PBXTDynamic524288301572966412652544avg_row_length=16
t_pbxt_longtext_arl_32PBXTDynamic524288462411827212652544avg_row_length=32
t_pbxt_longtext_arl_64PBXTDynamic524288784089548812652544avg_row_length=64
t_pbxt_longtext_arl_128PBXTDynamic5242881427444992012652544avg_row_length=128

It looks like PBXT calculates its internal AVG_ROW_LENGTH from the data type and NOT from the contents. PBXT seems to go more to the maximum possible size ('optimistic'). When you specified AVG_ROW_LENGTH, PBXT strictly follows the values more strictly. In practice this alogrithm IMHO is not optimial. We cannot assume that the majority of the users know about their data. And analysing the data to safe some space is not feasible in most cases. And this not because of a few percent of disk space safed. It is factor 5 to 26!!! IMHO PBXT should choos a more pessimistic approach in estimating row length. From the practice I would say that very often the users specify VARCHAR(255) but in reality they have only 20 - 40 characters stored in the field. An other intersing topic to investigate would be the impact on indexes...
olicomment

Further discussion we had about PBXT MEMORY tables partitioned: Q: Is there a good case where partitioned PBXT MEMORY tables could make sense? A: To have some partitions on disk and other in memory could be useful. Unfortunately MySQL (up to 5.6.1) and MariaDB (up to 5.2.3) do not support mixing different Storage Engines for a partitioned table yet:
ERROR 1497 (HY000): The mix of handlers in the partitions is not allowed in this version of MySQL
Lit: CREATE TABLE
olicomment