Skip navigation.
Home

Languages

RSS Feed FromDual on Linked In Recommend us on Xing Join us on Facebook Google+ Flickr Contact us

Transactional memory resident tables with PBXT

Introduction

In his presentation about PBXT at the DOAG Conference 2010 Paul McCullagh was speaking about memory resident PBXT tables. They will be available in version 1.1 of the PBXT Storage Engine Plug-in. Memory resident PBXT tables should have similar characteristics like normal MySQL MEMORY tables. But in addition to the MEMORY tables they are transactional and can handle BLOB and TEXT attributes.

An alternative to this feature would be the MySQL dynamic heap row patches provided by eBay and hosted at code.google.com.

In the following blog article we go through the installation of the PBXT Storage Engine Plug-in version 1.1 and how to use PBXT MEMORY resident tables.

Download

I have not found binaries yet. So we have to compile them ourself. Download is pretty simple:

shell> bzr branch lp:pbxt/1.1

Build

To build the pluggable PBXT Storage Engine we have to do the typical 3 steps: configure; make; make install. But there are some pitfalls to circumvent:

shell> ./configure --with-mysql=/home/mysql/source/mysql-5.1.53 --prefix=/home/mysql/product/mysql-5.1.53
shell> make

Pitfalls:

  • If you do NOT specify the --with-mysql option you will get the following error message:
    checking for mysql source code... configure: error: "no MySQL source found at /home/mysql/tmp"
  • If you did NOT run configure and make on those MySQL sources before you will get error messages as follows:
    checking for mysql source code... configure: error: "no Makefile found in source/mysql-5.1.53"
    include/my_global.h:80:23: error: my_config.h: No such file or directory                                              
    include/my_global.h:815:2: error: #error "please add -DSTACK_DIRECTION=1 or -1 to your CPPFLAGS"

    and compilation will fail. You have to do first at least a:

    shell> ./configure
    shell> make

    on the MySQL sources to be capable to compile the PBXT Storage Engine against them!

  • If you consider to use different configure options a second time, make sure you remove and download the sources from scratch again. Otherwise you may get results which are not expected:
    "prefix is not considered during make install" https://bugs.launchpad.net/bugs/677802>

Install

If you run:
shell> sudo make install

the files should be copied to their according locations. Then load the pluggable Storage Engine into MySQL:

mysql> INSTALL PLUGIN PBXT SONAME 'libpbxt.so';

If you get an error message complaining that MySQL cannot find the Storage Engine plug-in you should set your plugin_dir accordingly in your my.cnf:

mysql> SHOW GLOBAL VARIABLES LIKE 'plugin_dir';
+---------------+---------------------------------------------------+
| Variable_name | Value                                             |
+---------------+---------------------------------------------------+
| plugin_dir    | /home/mysql/product/mysql-5.1.53/lib/mysql/plugin |
+---------------+---------------------------------------------------+

Verify

Then, after successful loading, you can see your PBXT Storage Engine:

mysql> SHOW PLUGINS;
+------------+----------+----------------+------------+---------+
| Name       | Status   | Type           | Library    | License |
+------------+----------+----------------+------------+---------+
...
| PBXT       | ACTIVE   | STORAGE ENGINE | libpbxt.so | GPL     |
+------------+----------+----------------+------------+---------+

With the following command you can see which version is currently loaded:

mysql> SHOW ENGINE PBXT STATUS\G
*************************** 1. row ***************************
  Type: PBXT
  Name:
Status:
101120 13:48:07 PBXT 1.1.01 RC STATUS OUTPUT
...

Or from the INFORMATION_SCHEMA, which gives IMHO the wrong information:

mysql> SELECT plugin_name, plugin_version, plugin_type_version, plugin_library_version
  FROM INFORMATION_SCHEMA.plugins
 WHERE plugin_name = 'PBXT';
+-------------+----------------+---------------------+------------------------+
| plugin_name | plugin_version | plugin_type_version | plugin_library_version |
+-------------+----------------+---------------------+------------------------+
| PBXT        | 0.1            | 50153.0             | 1.0                    |
+-------------+----------------+---------------------+------------------------+

Bug #677839: Plugin version is not correctly reported in information_schema

Tests

For the first simple tests we use the MyEnv test table:

mysql> source /home/mysql/myenv/sql/test_table.sql
mysql> alter table test engine = pbxt storage memory;

Unfortunately I have not found yet a way to see if a table is a PBXT MEMORY table or not. Neither with SHOW CREATE TABLE nor with with SHOW TABLE STATUS or SELECT * FROM INFORMATION_SCHEMA.tables;.

The only way I found was to look at the file system:

mysql> system ls -la
-rw-rw---- 1 mysql dba    4 2010-11-20 15:32 test-3.xtr
-rw-rw---- 1 mysql dba 8612 2010-11-20 15:32 test.frm
-rw-rw---- 1 mysql dba  155 2010-11-20 15:32 test.xtd
-rw-rw---- 1 mysql dba 4096 2010-11-20 15:32 test.xti

mysql> ALTER TABLE test ENGINE = PBXT STORAGE MEMORY;

mysql> system ls -la
-rw-rw---- 1 mysql dba 8612 2010-11-20 15:32 test.frm

An other possibility is:

mysql> system cat ./pbxt/tables
[table]
id=37
name=pbxt
location=./test
storage=disk
type=0
[table]
id=42
name=pbxt_memory_blob
location=./test
storage=heap
type=0

Playing around with the table I have NOT found a way to convert a PBXT MEMORY table back to a DISK table yet. Both did NOT work:

mysql> ALTER TABLE test ENGINE=PBXT;
mysql> ALTER TABLE test ENGINE=PBXT STORAGE DISK;

Memory usage of PBXT MEMORY tables

What first interested me with MEMORY tables was their actual memory consumption. MySQL MEMORY tables make fixed sized rows. So I was wondering how PBXT MEMORY table behave.

5 different tables with exactly the same content:

mysql> SHOW TABLE STATUS;
+------------------+--------+------------+---------+----------------+-------------+--------------+
| Name             | Engine | Row_format | Rows    | Avg_row_length | Data_length | Index_length |
+------------------+--------+------------+---------+----------------+-------------+--------------+
| memory           | MEMORY | Fixed      | 1048576 |             74 |    80069616 |      8549392 |
| pbxt             | PBXT   | Dynamic    | 1048576 |             88 |    92275712 |     12619776 |
| pbxt_memory      | PBXT   | Dynamic    | 1048576 |             88 |    92275712 |     12619776 |
| test             | MyISAM | Dynamic    | 1048576 |             20 |    20971520 |     10762240 |
| pbxt_memory_blob | PBXT   | Dynamic    | 1048576 |            527 |   552600576 |     12619776 |
+------------------+--------+------------+---------+----------------+-------------+--------------+

PBXT claims for its memory table to have a dynamic row format. But it seems that for a VARCHAR it allocates the full length: 88 byte (14-26 byte header + 4 byte INT UNSIGNED, 64 VARCHAR + 4 byte TIMESTAMP).

For the table with the LONGTEXT field it looks even much worse. For a LONGTEXT there is some additional memory allocated for what ever reason...
So MEMORY tables seem to work with PBXT but they allocate roughly 6 times more memory when we use BLOB/TEXT attributes compared to VARCHAR. Possibly some more investigation has to be done here...

To limit MySQL MEMORY tables in growth there is a parameter called max_heap_table_size. With this behaviour of PBXT MEMORY tables it would be desirable to have a similar parameter to restrict PBXT MEMORY tables. I suggest: pbxt_max_memory_table_size?

Transactions

To find if transactions work as expected we run some simple tests. In the first example we kill a DELETE statement from an other session:

mysql> SELECT COUNT(*) FROM pbxt_memory;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.73 sec)

mysql> DELETE FROM pbxt_memory;
ERROR 1317 (70100): Query execution was interrupted

mysql> SELECT COUNT(*) FROM pbxt_memory;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    9
Current database: test

+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.71 sec)

What we can see is that the count takes longer than with a MEMORY table. But this can be explained because PBXT has to span a transaction and really has to count the data. Where MEMORY tables know always instantaneous, because of their non-parallel write behaviour, how many rows a table contains.

In the next simple example we start a transaction, delete some rows and roll back:

mysql> START TRANSACTION;

mysql> DELETE FROM pbxt_memory;
Query OK, 1048576 rows affected (11.48 sec)

mysql> SELECT COUNT(*) FROM pbxt_memory;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT COUNT(*) FROM pbxt_memory;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (12.02 sec)

Interesting behaviour we can see when we look at the timing of the commands...

Indexing

For unique in memory data structures, hash indexes perform better than tree indexes. With MEMORY tables there are the HASH indexes for this purpose.
When we try the same thing with PBXT MEMORY tables it seems to work:

mysql> ALTER TABLE pbxt_memory ADD INDEX (ts) USING HASH;
Query OK, 1048576 rows affected (39.42 sec)
Records: 1048576  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE pbxt_memory\G
*************************** 1. row ***************************
       Table: pbxt_memory
Create Table: CREATE TABLE `pbxt_memory` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` varchar(64) DEFAULT NULL,
  `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `ts` (`ts`) USING HASH
) ENGINE=PBXT DEFAULT CHARSET=latin1
1 row in set (0.02 sec)

But Paul told me that PBXT does not know about HASH indexes. So I would expect at least a warning if not even an error message...

Partitioning with PBXT MEMORY tables

Just by accident I had the idea of creating a partitioned PBXT MEMORY table:

CREATE TABLE test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, data VARCHAR(64)
, ts TIMESTAMP
)
ENGINE = PBXT STORAGE MEMORY
PARTITION BY RANGE( id )
(
  PARTITION p0 VALUES LESS THAN (1000)
, PARTITION p1 VALUES LESS THAN (2000)
, PARTITION p2 VALUES LESS THAN (3000)
, PARTITION p3 VALUES LESS THAN (4000)
);

After filling some data in I had a look at the execution plan:

mysql> EXPLAIN PARTITIONS
SELECT MAX(ts)
  FROM test
 WHERE id BETWEEN 2000 AND 3999;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | test  | p2,p3      | range | PRIMARY       | PRIMARY | 4       | NULL |    2 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+-------------+

So it seems event this works. I am not sure where partitioned PBXT MEMORY tables would make sense. But who knows...

It would be interesting now to tell MySQL that it should use PBXT MEMORY tables instead of MySQL MEMORY/MYISAM tables...

Attachment Size
pbxt_memory_test.sql 7.51 KB
pbxt_memory_test_avg_row_len.sql 6.77 KB

Partitioning with PBXT MEMORY tables

Further discussion we had about PBXT MEMORY tables partitioned:

Q: Is there a good case where partitioned PBXT MEMORY tables could make sense?

A: To have some partitions on disk and other in memory could be useful.

Unfortunately MySQL (up to 5.6.1) and MariaDB (up to 5.2.3) do not support mixing different Storage Engines for a partitioned table yet:

ERROR 1497 (HY000): The mix of handlers in the partitions is not allowed in this version of MySQL

Lit: CREATE TABLE

Normal PBXT makes guesses as

Normal PBXT makes guesses as to what rows should be stored fixed format and what varsized.

It is possible that for memory tables, this guess is wrong.

I forget the syntax to change it, but it is possible.

perhaps these default guesses should be different for in memory tables too.

Results from using PBXT with AVG_ROW_LENGTH

We dug some further with this problem of too big rows. The results you can find below.

Name Engine Row_format Rows Avg_row_length Data_length Index_length Create_options
t_innodb_varchar InnoDB Compact 524833 37 19447808 7880704
t_innodb_varcharlong InnoDB Compact 524833 37 19447808 7880704
t_innodb_char InnoDB Compact 524403 99 51986432 7880704
t_innodb_tinytext InnoDB Compact 524833 37 19447808 7880704
t_innodb_smalltext InnoDB Compact 524833 37 19447808 7880704
t_innodb_mediumtext InnoDB Compact 524833 37 19447808 7880704
t_innodb_longtext InnoDB Compact 524833 37 19447808 7880704
t_memory_char MEMORY Fixed 524288 73 40034552 8548880
t_memory_varchar MEMORY Fixed 524288 74 40034552 8548880
t_myisam_varchar MyISAM Dynamic 524288 20 10485760 10741760
t_myisam_varcharlong MyISAM Dynamic 524288 20 10485760 10741760
t_myisam_char MyISAM Fixed 524288 73 38273024 10728448
t_myisam_tinytext MyISAM Dynamic 524288 20 10485760 10741760
t_myisam_smalltext MyISAM Dynamic 524288 20 10485760 10741760
t_myisam_mediumtext MyISAM Dynamic 524288 24 12582912 10741760
t_myisam_longtext MyISAM Dynamic 524288 24 12582912 10741760
t_pbxt_varchar PBXT Dynamic 524288 88 46138368 12652544
t_pbxt_varcharlong PBXT Dynamic 524288 153 80217088 12652544
t_pbxt_char PBXT Fixed 524288 87 45614080 12652544
t_pbxt_tinytext PBXT Dynamic 524288 88 46138368 12652544
t_pbxt_smalltext PBXT Dynamic 524288 527 276300800 12652544
t_pbxt_mediumtext PBXT Dynamic 524288 527 276300800 12652544
t_pbxt_longtext PBXT Dynamic 524288 527 276300800 12652544
t_pbxt_memory_varchar PBXT Dynamic 524288 88 46138368 12652544
t_pbxt_memory_varcharlong PBXT Dynamic 524288 153 80217088 12652544
t_pbxt_memory_char PBXT Fixed 524288 87 45614080 12652544
t_pbxt_memory_tinytext PBXT Dynamic 524288 88 46138368 12652544
t_pbxt_memory_smalltext PBXT Dynamic 524288 527 276300800 12652544
t_pbxt_memory_mediumtext PBXT Dynamic 524288 527 276300800 12652544
t_pbxt_memory_longtext PBXT Dynamic 524288 527 276300800 12652544
t_pbxt_varchar_arl_4 PBXT Dynamic 524288 26 13632512 12652544 avg_row_length=4
t_pbxt_varchar_arl_8 PBXT Dynamic 524288 26 13632512 12652544 avg_row_length=8
t_pbxt_varchar_arl_16 PBXT Dynamic 524288 30 15729664 12652544 avg_row_length=16
t_pbxt_varchar_arl_32 PBXT Dynamic 524288 46 24118272 12652544 avg_row_length=32
t_pbxt_varchar_arl_64 PBXT Dynamic 524288 78 40895488 12652544 avg_row_length=64
t_pbxt_varchar_arl_128 PBXT Dynamic 524288 88 46138368 12652544 avg_row_length=128
t_pbxt_varcharlong_arl_4 PBXT Dynamic 524288 26 13632512 12652544 avg_row_length=4
t_pbxt_varcharlong_arl_8 PBXT Dynamic 524288 26 13632512 12652544 avg_row_length=8
t_pbxt_varcharlong_arl_16 PBXT Dynamic 524288 30 15729664 12652544 avg_row_length=16
t_pbxt_varcharlong_arl_32 PBXT Dynamic 524288 46 24118272 12652544 avg_row_length=32
t_pbxt_varcharlong_arl_64 PBXT Dynamic 524288 78 40895488 12652544 avg_row_length=64
t_pbxt_varcharlong_arl_128 PBXT Dynamic 524288 142 74449920 12652544 avg_row_length=128
t_pbxt_tinytext_arl_4 PBXT Dynamic 524288 26 13632512 12652544 avg_row_length=4
t_pbxt_tinytext_arl_8 PBXT Dynamic 524288 26 13632512 12652544 avg_row_length=8
t_pbxt_tinytext_arl_16 PBXT Dynamic 524288 30 15729664 12652544 avg_row_length=16
t_pbxt_tinytext_arl_32 PBXT Dynamic 524288 46 24118272 12652544 avg_row_length=32
t_pbxt_tinytext_arl_64 PBXT Dynamic 524288 78 40895488 12652544 avg_row_length=64
t_pbxt_tinytext_arl_128 PBXT Dynamic 524288 142 74449920 12652544 avg_row_length=128
t_pbxt_longtext_arl_4 PBXT Dynamic 524288 26 13632512 12652544 avg_row_length=4
t_pbxt_longtext_arl_8 PBXT Dynamic 524288 26 13632512 12652544 avg_row_length=8
t_pbxt_longtext_arl_16 PBXT Dynamic 524288 30 15729664 12652544 avg_row_length=16
t_pbxt_longtext_arl_32 PBXT Dynamic 524288 46 24118272 12652544 avg_row_length=32
t_pbxt_longtext_arl_64 PBXT Dynamic 524288 78 40895488 12652544 avg_row_length=64
t_pbxt_longtext_arl_128 PBXT Dynamic 524288 142 74449920 12652544 avg_row_length=128

It looks like PBXT calculates its internal AVG_ROW_LENGTH from the data type and NOT from the contents. PBXT seems to go more to the maximum possible size ('optimistic'). When you specified AVG_ROW_LENGTH, PBXT strictly follows the values more strictly.

In practice this alogrithm IMHO is not optimial. We cannot assume that the majority of the users know about their data. And analysing the data to safe some space is not feasible in most cases. And this not because of a few percent of disk space safed. It is factor 5 to 26!!!

IMHO PBXT should choos a more pessimistic approach in estimating row length. From the practice I would say that very often the users specify VARCHAR(255) but in reality they have only 20 - 40 characters stored in the field.

An other intersing topic to investigate would be the impact on indexes...

AVG_ROW_LENGTH = 12

Hi Stewart,

Thanks for your comment. Paul made some similar conclusions:

The problem could be that the fixed length record component is too large. PBXT tries to estimate the average row length of a table when it is created. It then sets the fixed length record component to the estimated average row length value.

If the estimate is not good, you can set it explicitly by setting AVG_ROW_LENGTH.

Optimizing the average row length is explained in my talk.

For the moment you can just try AVG_ROW_LENGTH = 12. This will place all the data in the extended record component, which optimizes memory usage.

So I have to dive into this again and do some more investigations.

Cheers,
Oli