You are here

MariaDB MaxScale Load Balancer with Master/Master Replication

For this how-to we were working with a MariaDB 10.6.5 database server and the newest MariaDB MaxScale 6.3.0 (aka 2.6) load balancer/proxy.

As a starting point we had a working MariaDB Master/Master replication with one active Master (Ma) and one passive Master (Mp). Whereas the passive Master was set to read_only. As a replication user we used the user replication.

Creating database accounts for the MaxScale Load Balancer

The MaxScale load balancer connects itself with the application users to the database. To do this it needs to retrieve the available users from the database first. To get these users the MaxScale account needs some privileges:

CREATE USER 'maxscale_admin'@'%' IDENTIFIED BY 'secret';

GRANT SELECT ON mysql.user TO 'maxscale_admin'@'%';
GRANT SELECT ON mysql.db TO 'maxscale_admin'@'%';
GRANT SELECT ON mysql.tables_priv TO 'maxscale_admin'@'%';
GRANT SELECT ON mysql.columns_priv TO 'maxscale_admin'@'%';
GRANT SELECT ON mysql.procs_priv TO 'maxscale_admin'@'%';
GRANT SELECT ON mysql.proxies_priv TO 'maxscale_admin'@'%';
GRANT SELECT ON mysql.roles_mapping TO 'maxscale_admin'@'%';
GRANT SHOW DATABASES ON *.* TO 'maxscale_admin'@'%';

For monitoring the replication and doing proper switchover and failover MaxScale further needs a monitoring account:

CREATE USER 'maxscale_monitor'@'%' IDENTIFIED BY 'secret';

GRANT REPLICATION CLIENT on *.* to 'maxscale_monitor'@'%';
GRANT REPLICATION SLAVE on *.* to 'maxscale_monitor'@'%';
GRANT SUPER, RELOAD on *.* to 'maxscale_monitor'@'%';

And last we need an application account for our test application:

CREATE USER 'app'@'%' IDENTIFIED BY 'secret';

GRANT ALL on test.* to 'app'@'%';

Starting MariaDB MaxScale Load Balancer

Because we do not use the provided DEB/RPM packages on our systems but generic binary tarballs, which are not available, we have to start the MaxScale Load Balancer a bit over-complicated:

# export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/mysql/product/maxscale/usr/lib/x86_64-linux-gnu/maxscale
# ldd ./maxscale
# ./maxscale --help

# ./maxscale --config=/home/mysql/etc/maxscale_mm.cnf --datadir=/home/mysql/tmp --basedir=/home/mysql/product/maxscale --logdir=/home/mysql/log --config-check

# ./maxscale --config=/home/mysql/etc/maxscale_mm.cnf --datadir=/home/mysql/tmp --basedir=/home/mysql/product/maxscale --logdir=/home/mysql/log

The MaxScale configuration file we used in these tests looks as follows:

#
# maxscale_mm.cnf
#

[maxscale]
threads        = auto

[master1]
type           = server
address        = 192.168.1.11
port           = 3306
protocol       = MariaDBBackend

[master2]
type           = server
address        = 192.168.1.12
port           = 3306
protocol       = MariaDBBackend

[MultiMasterMonitor]
type           = monitor
module         = mariadbmon
servers        = master1,master2
user           = maxscale_monitor
password       = secret
enforce_read_only_slaves = true
auto_rejoin    = true
# auto_failover  = true

[WriteService]
type           = service
router         = readconnroute
router_options = master
servers        = master1,master2
user           = maxscale_admin
password       = secret

[ReadService]
type           = service
router         = readconnroute
router_options = slave
servers        = master1,master2
user           = maxscale_admin
password       = secret

[WriteListener]
type           = listener
service        = WriteService
protocol       = MariaDBClient
port           = 3306

[ReadListener]
type           = listener
service        = ReadService
protocol       = MariaDBClient
port           = 3307

If the start was successful can be seen for example with:

# ps -ef | grep maxscale

If you start the MariaDB MaxScale the first time no user/password is needed. So we can connect the the MaxScale Load Balancer with the maxctrl client:

# ./maxctrl
maxctrl> list listeners
┌──────────────────┬──────┬──────┬─────────┬──────────────┐
│ Name             │ Port │ Host │ State   │ Service      │
├──────────────────┼──────┼──────┼─────────┼──────────────┤
│ WriteListener    │ 3306 │ ::   │ Running │ WriteService │
├──────────────────┼──────┼──────┼─────────┼──────────────┤
│ MaxAdminListener │ 3307 │ ::   │ Running │ ReadService  │
└──────────────────┴──────┴──────┴─────────┴──────────────┘

maxctrl> list services
┌──────────────┬───────────────┬─────────────┬───────────────────┬──────────────────┐
│ Service      │ Router        │ Connections │ Total Connections │ Servers          │
├──────────────┼───────────────┼─────────────┼───────────────────┼──────────────────┤
│ WriteService │ readconnroute │ 0           │ 0                 │ master1, master2 │
├──────────────┼───────────────┼─────────────┼───────────────────┼──────────────────┤
│ ReadService  │ readconnroute │ 0           │ 0                 │ master1, master2 │
└──────────────┴───────────────┴─────────────┴───────────────────┴──────────────────┘

maxctrl> list servers
┌─────────┬──────────────┬──────┬─────────────┬──────────────────────────────┬────────────┐
│ Server  │ Address      │ Port │ Connections │ State                        │ GTID       │
├─────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼────────────┤
│ master1 │ 192.168.1.11 │ 3306 │ 0           │ Master, Running              │ 0-3308-134 │
├─────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼────────────┤
│ master2 │ 192.168.1.12 │ 3306 │ 0           │ Relay Master, Slave, Running │ 0-3308-134 │
└─────────┴──────────────┴──────┴─────────────┴──────────────────────────────┴────────────┘

maxctrl> list sessions
┌────────┬──────┬─────────────────────┬──────────────────────────┬──────┬─────────────┐
│ Id     │ User │ Host                │ Connected                │ Idle │ Service     │
├────────┼──────┼─────────────────────┼──────────────────────────┼──────┼─────────────┤
│ 746468 │ app  │ ::ffff:192.168.1.99 │ Fri May 13 15:28:55 2022 │ 0.4  │ ReadService │
└────────┴──────┴─────────────────────┴──────────────────────────┴──────┴─────────────┘

To not allow everybody to access MariaDB MaxScale we set a password for the admin user:

maxctrl> alter user admin secret
OK

Then you can connect with username and password:

# ./maxctrl --user=admin --password=secret
maxctrl> list users
┌───────┬──────┬────────────┐
│ Name  │ Type │ Privileges │
├───────┼──────┼────────────┤
│ admin │ inet │ admin      │
└───────┴──────┴────────────┘

Testing connections over the MaxScale Load Balancer

To test if connections are ending up on the correct database instances we used the following commands:

# mariadb --user=app --host=192.168.1.1 --port=3306 --password=secret test --execute='SELECT @@hostname, @@port'
# mariadb --user=app --host=192.168.1.1 --port=3307 --password=secret test --execute='SELECT @@hostname, @@port'

Monitoring of nodes in MariaDB MaxScale

To see what is going on inside the MariaDB MaxScale Load Balancer we used the following command to create a simple real time monitor:

# watch -d -n 1 ./maxctrl --user=admin --password=secret list servers

Switchover and Failover with MariaDB MaxScale

To use MariaDB MaxScales switchover and failover capabilities the Master/Master Replication must be configured to use Global Transaction IDs (GTID). To change to GTID based replication you can use the following commands on both masters:

SQL> STOP SLAVE;
SQL> CHANGE MASTER TO MASTER_USE_GTID = SLAVE_POS;
SQL> START SLAVE;

Switchover in MariaDB MaxScale

A graceful switchover is used in a controlled situation if we want to switch the roles between the active master and the passive master and vice versa... This can be used for example before a maintenance operation: We do the maintenance operation on the passive Master first, then we switch the roles and then we can do the maintenance operation on the now new passive Master:

maxctrl> call command mariadbmon switchover MultiMasterMonitor

After switching for and back a few times we found out, that the MariaDB MaxScale Monitor has replaced our replication user replication by his own user maxscale_monitor. This is not documented? and I do really not like it, especially if it is done silently...

Failover in MariaDB MaxScale

To provoke/simulate a failover situation we stopped the active Master. Then we see in our monitor that the active Master is down and the passive Master is still running:

maxctrl> list servers
┌─────────┬──────────────┬──────┬─────────────┬────────────────┬────────────┐
│ Server  │ Address      │ Port │ Connections │ State          │ GTID       │
├─────────┼──────────────┼──────┼─────────────┼────────────────┼────────────┤
│ master1 │ 192.168.1.11 │ 3306 │ 0           │ Down           │ 0-3309-142 │
├─────────┼──────────────┼──────┼─────────────┼────────────────┼────────────┤
│ master2 │ 192.168.1.12 │ 3306 │ 1           │ Slave, Running │ 0-3309-142 │
└─────────┴──────────────┴──────┴─────────────┴────────────────┴────────────┘

We further observe, that our application (the famous insert_test.sh) is not working any more but throwing errors... Then we trigger a manual failover:

maxctrl> call command mariadbmon failover MultiMasterMonitor

It looks like an automatic failover is possible (auto_failover = true) but we do not recommend this set-up and thus we did not further investigate in this feature.

After the manual failover the former passive Master becomes active Master:

maxctrl> list servers
┌─────────┬──────────────┬──────┬─────────────┬─────────────────┬───────────────┐
│ Server  │ Address      │ Port │ Connections │ State           │ GTID          │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────────┤
│ master1 │ 192.168.1.11 │ 3306 │ 0           │ Down            │ 0-3308-365829 │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────────┤
│ master2 │ 192.168.1.12 │ 3306 │ 0           │ Master, Running │ 0-3308-370235 │
└─────────┴──────────────┴──────┴─────────────┴─────────────────┴───────────────┘

Now we simulate the repair of the former failed active Master by just restarting it:

maxctrl> list servers
┌─────────┬──────────────┬──────┬─────────────┬─────────────────┬───────────────┐
│ Server  │ Address      │ Port │ Connections │ State           │ GTID          │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────────┤
│ master1 │ 192.168.1.11 │ 3306 │ 0           │ Slave, Running  │ 0-3308-401309 │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────────┤
│ master2 │ 192.168.1.12 │ 3306 │ 1           │ Master, Running │ 0-3308-401309 │
└─────────┴──────────────┴──────┴─────────────┴─────────────────┴───────────────┘

It comes back into the MariaDB MaxScale but just as a simple Slave. So a failover seems to break our Master/Master replication for some reasons. This must be repaired afterwards. Whereas a simple switchover seems to work properly. If this is a bug or intended behaviour I do not know... With the following command on master2 the loop is closed again:

SQL> CHANGE MASTER TO master_host='192.168.1.11', master_port=3306, master_user='maxscale_monitor', master_password='secret', master_use_gtid=slave_pos;

SQL> START SLAVE;

Now everything is fine and working as expected again:

maxctrl> list servers
┌─────────┬──────────────┬──────┬─────────────┬──────────────────────────────┬───────────────┐
│ Server  │ Address      │ Port │ Connections │ State                        │ GTID          │
├─────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────────┤
│ master1 │ 192.168.1.11 │ 3306 │ 1           │ Relay Master, Slave, Running │ 0-3308-440194 │
├─────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────────┤
│ master2 │ 192.168.1.12 │ 3306 │ 0           │ Master, Running              │ 0-3308-440194 │
└─────────┴──────────────┴──────┴─────────────┴──────────────────────────────┴───────────────┘

Stopping and starting the passive Master had not effect at all.

Switchover with a lagging passive master

To test this scenario we created an artificial lag of the passive Master by setting innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1. When the passive master was lagging far enough (about 30 seconds) we tried an switchover:

maxctrl> call command mariadbmon switchover MultiMasterMonitor
Error: timeout of 10000ms exceeded

So it is not really clear what happens in this case and the error message is not really telling us about the problem. Further this operations somehow breaks Master/Master replication again.

Failover with a lagging passive master

If we try a failover instead of a switchover we get at least a bit a more meaningful error message:

Error: Server at http://127.0.0.1:8989 responded with 403 Forbidden to `POST maxscale/modules/mariadbmon/failover?MultiMasterMonitor`
{
    "links": {
        "self": "http://127.0.0.1:8989/v1/maxscale/modules/mariadbmon/failover/"
    },
    "meta": {
        "errors": [
            {
                "detail": "Can not select 'master1' as a demotion target for failover because it is a running master."
            },
            {
                "detail": "Failover cancelled."
            }
        ]
    }
}

Draining a node with MariaDB MaxScalse

If we try to drain a passive Master we get some warnings but it seems like the result is the expected:

maxctrl> drain server master2
Warning: Saving runtime modifications to 'ReadService' in '/home/mysql/product/maxscale/var/lib/maxscale/maxscale.cnf.d/ReadService.cnf'. The modified values will override the values found in the static configuration files.;Saving runtime modifications to 'WriteService' in '/home/mysql/product/maxscale/var/lib/maxscale/maxscale.cnf.d/WriteService.cnf'. The modified values will override the values found in the static configuration files.
To hide these warnings, run:

    export MAXCTRL_WARNINGS=0

Warning: Saving runtime modifications to 'ReadService' in '/home/mysql/product/maxscale/var/lib/maxscale/maxscale.cnf.d/ReadService.cnf'. The modified values will override the values found in the static configuration files.;Saving runtime modifications to 'WriteService' in '/home/mysql/product/maxscale/var/lib/maxscale/maxscale.cnf.d/WriteService.cnf'. The modified values will override the values found in the static configuration files.
To hide these warnings, run:

    export MAXCTRL_WARNINGS=0

OK
maxctrl> list servers
┌─────────┬──────────────┬──────┬─────────────┬──────────────────────┬───────────────┐
│ Server  │ Address      │ Port │ Connections │ State                │ GTID          │
├─────────┼──────────────┼──────┼─────────────┼──────────────────────┼───────────────┤
│ master1 │ 192.168.1.11 │ 3306 │ 0           │ Master, Running      │ 0-3308-631119 │
├─────────┼──────────────┼──────┼─────────────┼──────────────────────┼───────────────┤
│ master2 │ 192.168.1.12 │ 3306 │ 0           │ Maintenance, Running │ 0-3308-631119 │
└─────────┴──────────────┴──────┴─────────────┴──────────────────────┴───────────────┘

Un-drain a node with MariaDB MaxScale

For un-draining the node we tried:

maxctrl> clear server master2 drain
OK

On the first look everything seems to be OK. But the State of master2 was still in Maintenance. The command:

maxctrl> clear server master2 maintenance

did the job... If this is a bug or should be considered as an operator error I do not know...

Draining the active Master caused a switchover.

Sources