You are here

Keep your Galera Cluster up and running by all means

We see quite often customers complaining that their Galera Cluster is not stable and "crashes" from time to time. As always one has to investigate before rating.

What comes out quite often is that the customer (or better their developers) are running huge transactions.

In general transactional database do NOT like huge transactions because of various reasons (MVCC, ROLLBACK, UNDO, Locking etc.). They can do it. But they are not quite good in doing it and they do not like it. Instead you should better do many smaller transactions which you can run in parallel to keep the throughput. But: This causes more work for the one who should doing this transactions and needs more intelligence in the code...

Galera Cluster itself has some hard limits:

SQL> SHOW GLOBAL VARIABLES LIKE 'wsrep%ws%';
+-------------------+------------+
| Variable_name     | Value      |
+-------------------+------------+
| wsrep_max_ws_rows | 0          |
| wsrep_max_ws_size | 2147483647 |
+-------------------+------------+

The variable wsrep_max_ws_rows defines how many rows are allowed at maximum in one write set (~ transaction). Zero means NO limit. In older versions this value was set by default to 128k rows as far as I can remember. The variable wsrep_max_ws_size defines the maximum size of a write set in bytes. This is 2 Gibyte. In older versions this value was set by default to 1 Gibyte. Bigger is NOT possible except you are using Streaming Replication. For disadvantages of Streaming Replication see further down.

Do small transactions

So the technically good way to handle this situation is to make smaller transactions. We suggest a batch size of about 100 to 10000 rows per batch (= transaction). This causes much less stress to the database. And the chance to bring your Galera Cluster out of service is much smaller. If you run these smaller batches every let us say 10 seconds (to give the Galera Cluster a chance to breath) you can delete between 864k and 86.4 Mio rows per day. This should be sufficient in most cases.
Something like:

START TRANSACTION;
DELETE FROM trash
 WHERE creation_date < DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 10 YEAR)
 LIMIT 1000;
COMMIT;

should do the job.

Unfortunately this has to be done and/or implemented most often by developers. And they are not aware of the issue and/or do not want to listen do DBA's, Admins or Consultants. What about the DevOps approach here? Can anybody explain me again, what DevOps means exactly???Responsibility for Operations by Developers?

Force small transactions

If you are responsible for a Galera Cluster and if it is your duty or your honour to keep it up an running by all means you first can try to educate the people in charge of transactions. But if all your effort does not help you possibly have to unwrap the big hammer.

This method is especially useful in situations where you have different applications running on the same consolidated Galera Cluster and one application always tears down the whole Galera Cluster and all other applications suffer.

You can use the wsrep_max_ws_rows variable to limit transaction size and keep your Galera Cluster through this limitation hopefully up an running.

SQL> SET GLOBAL wsrep_max_ws_rows = 10000;

If you want to watch some status variables during experimenting I suggest these:

shell> watch -d -n 1 "mysql -uroot --execute=\"SHOW GLOBAL STATUS WHERE variable_name LIKE 'wsrep_re%' OR variable_name LIKE 'innodb_rows%'\" | column -t"

Variable_name           Value
Innodb_rows_deleted     41003
Innodb_rows_inserted    2097152
Innodb_rows_read        4224307
Innodb_rows_updated     0
wsrep_replicated        5
wsrep_replicated_bytes  1936
wsrep_repl_keys         9
wsrep_repl_keys_bytes   192
wsrep_repl_data_bytes   1419
wsrep_repl_other_bytes  0
wsrep_received          62
wsrep_received_bytes    99388160
wsrep_ready             ON

Side note: What we found during experimenting is that FLUSH STATUS seems to be replicated to the other nodes...

So for transactions less or equal 10'000 rows it works fine:

SQL> DELETE from TEST LIMIT 10000;
Query OK, 10000 rows affected (0.066 sec)

If you go above this limit your application gets an error:

SQL> DELETE FROM test LIMIT 10001;
ERROR 1180 (HY000): wsrep_max_ws_rows exceeded

and a ROLLBACK is done but the Galera Cluster is not severely affected.

I hope this hard action helps you keep your Galera Cluster up and running a bit more as before.

Disadvantages of Streaming Replication

  • Only works in Galera 4 and newer (MariaDB 10.4 and MySQL 8.0).
  • It is a "new" feature and not yet widely used. So you might hit some unknown bugs.
  • Streaming Replication increases the load on the node, which may adversely affect its performance. [ 1 ]
  • It is recommended that you only enable Streaming Replication at a session-level and then only for transactions that would not run correctly without it. [ 1 ]
  • The rollback operation consumes system resources on all nodes. When long-running write transactions frequently need to be rolled back, this can become a performance problem. [ 1 ]

Therefore, it is a good application design policy to use shorter transactions whenever possible. In the event that your application performs batch processing or scheduled housekeeping tasks, consider splitting these into smaller transactions in addition to using Streaming Replication. [ 1 ]

IMHO Streaming Replication is a typical feature which was implement because the users want it and are to lazy to fix their stuff and not because it technically makes sense.

Literature


Taxonomy upgrade extras: