You are here

MySQL replication with filtering is dangerous

From time to time we see in customer engagements that MySQL Master/Slave replication is set-up doing schema or table level replication filtering. This can be done either on Master or on Slave. If filtering is done on the Master (by the binlog_{do|ignore}_db settings), the binary log becomes incomplete and cannot be used for a proper Point-in-Time-Recovery. Therefore FromDual recommends AGAINST this approach.

The replication filtering rules vary depending on the binary log format (ROW and STATEMENT) See also: How Servers Evaluate Replication Filtering Rules.

For reasons of data consistency between Master and Slave FromDual recommends to use only the binary log format ROW. This is also stated in the MySQL documentation: All changes can be replicated. This is the safest form of replication. Especially dangerous is binary log filtering with binary log format MIXED. This binary log format FromDual strongly discourages users to use.

The binary log format ROW affects only DML statements (UPDATE, INSERT, DELETE, etc.) but NOT DDL statements (CREATE, ALTER, DROP, etc.) and NOT DCL statements (CREATE, GRANT, REVOKE, DROP, etc.). So how are those statements replicated? They are replicated in STATEMENT binary log format even though binlog_format is set to ROW. This has the consequences that the binary log filtering rules of STATEMENT based replication and not the ones of ROW based replication apply when running one of those DDL or DCL statements.

This can easily cause problems. If you are lucky, they will cause the replication to break sooner or later, which you can detect and fix - but they may also cause inconsistencies between Master and Slave which may remain undetected for a long time.

Let us show what happens in 2 similar scenarios:

Scenario A: Filtering on mysql schema

On Slave we set the binary log filter as follows:

replicate_ignore_db = mysql

and verify it:

mysql> SHOW SLAVE STATUS\G
...
          Replicate_Ignore_DB: mysql
...

The intention of this filter setting is to not replicate user creations or modifications from Master to the Slave.

We verify on the Master, that binlog_format is set to the wanted value:

mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+

Now we do the following on the Master:

mysql> use mysql
mysql> CREATE USER 'inmysql'@'%';
mysql> use test
mysql> CREATE USER 'intest'@'%';

and verify the result on the Master:

mysql> SELECT user, host FROM mysql.user;
+-------------+-----------+
| user        | host      |
+-------------+-----------+
| inmysql     | %         |
| intest      | %         |
| mysql.sys   | localhost |
| root        | localhost |
+-------------+-----------+

and on the Slave:

mysql> SELECT user, host FROM mysql.user;
+-------------+-----------+
| user        | host      |
+-------------+-----------+
| intest      | %         |
| mysql.sys   | localhost |
| root        | localhost |
+-------------+-----------+

We see, that the user intest was replicated and the user inmysql was not. And we have clearly an unwanted data inconsistency between Master and Slave.

If we want to drop the inmysql user some time later on the Master:

mysql> use myapp;
mysql> DROP USER 'inmysql'@'%';

we get the following error message on the Slave and are wondering, why this user or the query appears on the Slave:

mysql> SHOW SLAVE STATUS\G
...
               Last_SQL_Errno: 1396
               Last_SQL_Error: Error 'Operation DROP USER failed for 'inmysql'@'%'' on query. Default database: 'test'. Query: 'DROP USER 'inmysql'@'%''
...

A similar problem happens when we connect to NO database on the Master as follows and change the users password:

shell> mysql -uroot
mysql> SELECT DATABASE();
+------------+
| database() |
+------------+
| NULL       |
+------------+
mysql> ALTER USER 'innone'@'%' IDENTIFIED BY 'secret';

This works perfectly on the Master. But what happens on the Slave:

mysql> SHOW SLAVE STATUS\G
...
               Last_SQL_Errno: 1396
               Last_SQL_Error: Error 'Operation ALTER USER failed for 'innone'@'%'' on query. Default database: ''. Query: 'ALTER USER 'innone'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*14E65567ABDB5135D0CFD9A70B3032C179A49EE7''
...

The Slave wants to tell us in a complicated way, that the user innone does not exist on the Slave...

Scenario B: Filtering on tmp or similar schema

An other scenario we have seen recently is that the customer is filtering out tables with temporary data located in the tmp schema. Similar scenarios are cache, session or log tables. He did it as follows on the Master:

mysql> use tmp;
mysql> TRUNCATE TABLE tmp.test;

As he has learned in FromDual trainings he emptied the table with the TRUNCATE TABLE command instead of a DELETE FROM tmp.test command which is much less efficient than the TRUNCATE TABLE command. What he did not consider is, that the TRUNCATE TABLE command is a DDL command and not a DML command and thus the STATEMENT based replication filtering rules apply. His filtering rules on the Slave were as follows:

mysql> SHOW SLAVE STATUS\G
...
          Replicate_Ignore_DB: tmp
...

When we do the check on the Master we get an empty set as expected:

mysql> SELECT * FROM tmp.test;
Empty set (0.00 sec)

When we add new data on the Master:

mysql> INSERT INTO tmp.test VALUES (NULL, 'new data', CURRENT_TIMESTAMP());
mysql> SELECT * FROM tmp.test;
+----+-----------+---------------------+
| id | data      | ts                  |
+----+-----------+---------------------+
|  1 | new data  | 2017-01-11 18:00:11 |
+----+-----------+---------------------+

we get a different result set on the Slave:

mysql> SELECT * FROM tmp.test;
+----+-----------+---------------------+
| id | data      | ts                  |
+----+-----------+---------------------+
|  1 | old data  | 2017-01-11 17:58:55 |
+----+-----------+---------------------+

and in addition the replication stops working with the following error:

mysql> SHOW SLAVE STATUS\G
...
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table tmp.test; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log laptop4_qa57master_binlog.000042, end_log_pos 1572
...

See also our earlier bug report of a similar topic: Option "replicate_do_db" does not cause "create table" to replicate ('row' log)

Conclusion

Binary log filtering is extremely dangerous when you care about data consistency and thus FromDual recommends to avoid binary log filtering by all means. If you really have to do binary log filtering you should exactly know what you are doing, carefully test your set-up, check your application and your maintenance jobs and also review your future code changes regularly. Otherwise you risk data inconsistencies in your MySQL Master/Slave replication.