You are here

Setting the right GCache size in Galera Cluster

One of our customers had a question related to the right value of Galera Cache size (gcache.size) in Galera Cluster for MySQL which I would like to share with you.

The question was: My maintenance window takes 4 hours for my 5TB DB. How can I avoid an SST ?!

Basically, having too small GCache size will lead to SST (Snapshot State Transfer) instead of IST (Incremental State Transfer), thus we can avoid the SST by setting the GCache to the appropriate value.

To check the current value of the GCache size:

mysql> SHOW GLOBAL VARIABLES LIKE 'wsrep_provider_options'\G Variable_name: wsrep_provider_options Value: base_host = 192.168.1.12; . . . gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; . . .

The value of GCache size could be changed by adding the following line in the my.cnf file and restarting the node (it could NOT be changed online):

#my.cnf [mysqld] wsrep_provider_options="gcache.size=256M"

But the question is how can we calculate the right value for GCache size to cover the maintenance window and at the same time not larger than what it needs?

To answer that question we should first find out how much GCache can handle which could be calculated by the following formula:

Hold time = GCache size / Replication Rate.

Where:

  • Replication Rate = Amount of replicated data / time.
  • Amount of replicated data = (wsrep_replicated_bytes + wsrep_received_bytes) after the maintenance window - (wsrep_replicated_bytes + wsrep_received_bytes) before the maintenance window.

The amount of replicated data for the customer's case = 7200MB.

Now, we can find out how much GCache (default 128M) can handle for the customer's case:

Hold time = 128MB / (7200MB / 4h) = 128MB / 0.5 MB = 256s.

Then, we can calculate the right GCache size value to handle the maintenance window by the following formula:
GCache = Maintenance window * Replication Rate = 14400s * 0.5 MB.
GCache = 7200MB.

In other words, the right GCache size should be equivalent to (or not less than) the amount of replicated data.

A shorter way using the binary logs size

Is there any relation between (wsrep_replicated_bytes + wsrep_received_bytes) and the binary log traffic? Lets check that by the following test scenario:

  • Starting a Galera Cluster where the newest binary log file size will be empty (120 bytes) and the same for the status variables:
    mysql> SHOW GLOBAL STATUS LIKE 'wsrep_replicated_bytes'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | wsrep_replicated_bytes | 0 | +------------------------+-------+ 1 row in set (0.00 sec) mysql> SHOW GLOBAL STATUS LIKE 'wsrep_received_bytes'; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | wsrep_received_bytes | 368 | +----------------------+-------+ 1 row in set (0.00 sec)
  • Execute some DML statements on the current node (to increase wsrep_replicated_bytes) and on another node (to increase wsrep_received_bytes), then check the status values:
    mysql> SHOW GLOBAL STATUS LIKE 'wsrep_replicated_bytes'; +------------------------+----------+ | Variable_name | Value | +------------------------+----------+ | wsrep_replicated_bytes | 80125192 | +------------------------+----------+ 1 row in set (0.00 sec) mysql> SHOW GLOBAL STATUS LIKE 'wsrep_received_bytes'; +------------------------+----------+ | Variable_name | Value | +------------------------+----------+ | wsrep_received_bytes | 40062948 | +------------------------+----------+ 1 row in set (0.00 sec)

    The Amount of replicated data = (80125192 + 40062948) - (0 + 368) = 120187772 Bytes.

  • Checking the increase of the binary log file:
    shell> ll /var/lib/mysql/mysql-bin.000243 -rw-rw---- 1 mysql mysql 113769454 Mar 20 13:42 mysql-bin.000243
  • Notes:

    • The variable log_slave_updates MUST to be enabled on ALL nodes, otherwise, the option wsrep_received_bytes will not be reflected on the binary logs, thus will lead to WRONG calculations!!
    • Since the cluster is freshly started, MySQL started to write into a new binary log file while in an already running cluster we can force MySQL to start from new binary log file by issuing the SQL command "FLUSH BINARY LOGS;" so that we can use the total size of the binary logs generated after that during the maintenance.

    Conclusion:

    Although the binary log traffic will be always less than the amount of replicated data but they are nearly close, thus we can use it to get a rough estimation about the right GCache size value, hence the formula will be as follows:

    • GCache size = Maintenance window * Replication Rate.
    • GCache size = (wsrep_replicated_bytes + wsrep_received_bytes) after the maintenance window - (wsrep_replicated_bytes + wsrep_received_bytes) before the maintenance window.
    • GCache size = binary log traffic which occurs during the maintenance window.