You are here
MySQL Tech-Feed (en)
Advanced MySQL Enterprise Training by FromDual
Due to the increasing demand FromDual has developed an Advanced MySQL Enterprise Training for DBAs and DevOps. After testing this training extensively with some selected customers last year we offer this MySQL Enterprise Training in 2018 for a broader audience.
The MySQL Enterprise Training addresses MySQL DBAs and DevOps which are already familiar with MySQL and approach now the challenge to operate a serious MySQL Enterprise infrastructure.
The topics of the 3 days MySQL Enterprise training you can find here.
You further have the opportunity to add 2 extra days of MySQL Performance Tuning from the Advanced MySQL Training.
We would be pleased to hold this training in-house in your company or at the location of one of our training partners in Essen, Berlin and Cologne (Germany).
For any question please contact us by eMail.
Taxonomy upgrade extras: trainingenterprisemysqladvancedMySQL 8.0.4-rc is out
Yesterday MySQL 8.0.4-rc came out. The Release Notes are quite long.
But caution: Do a BACKUP before upgrading...
I experienced some nice surprises. First I have to admit that I did not read the Release Notes or anything else. Reading manuals is for Girlies! Possibly something is written in there which is of importance. But I expect that it just works as usual...
I downloaded MySQL 8.0.4-rc and just want to upgrade my MySQL 8.0.3-rc testing system, where we did the 1M tables test.
First I got:
[MY-011096] No data dictionary version number found. [MY-010020] Data Dictionary initialization failed. [MY-010119] AbortingHmmm... Maybe something was not clean with the old system. So downgrade again:
[ERROR] [000000] InnoDB: Unsupported redo log format. The redo log was created with MySQL 8.0.4. Please follow the instructions at http://dev.mysql.com/doc/refman/8.0/en/upgrading-downgrading.html [ERROR] [000000] InnoDB: Plugin initialization aborted with error Generic error [ERROR] [003957] Failed to initialize DD Storage Engine [ERROR] [003634] Data Dictionary initialization failed. [ERROR] [003742] AbortingOK. Understandable. I should have done a backup before. But backup is for Girlies as well! Anyway this test system is not important. So I created a new instance from scratch which finally worked... Possibly just removing the redo log files as indicated would have helped as well.
MySQL 8.0.4-rc is out
Yesterday MySQL 8.0.4-rc came out. The Release Notes are quite long.
But caution: Do a BACKUP before upgrading...
I experienced some nice surprises. First I have to admit that I did not read the Release Notes or anything else. Reading manuals is for Girlies! Possibly something is written in there which is of importance. But I expect that it just works as usual...
I downloaded MySQL 8.0.4-rc and just want to upgrade my MySQL 8.0.3-rc testing system, where we did the 1M tables test.
First I got:
[MY-011096] No data dictionary version number found. [MY-010020] Data Dictionary initialization failed. [MY-010119] AbortingHmmm... Maybe something was not clean with the old system. So downgrade again:
[ERROR] [000000] InnoDB: Unsupported redo log format. The redo log was created with MySQL 8.0.4. Please follow the instructions at http://dev.mysql.com/doc/refman/8.0/en/upgrading-downgrading.html [ERROR] [000000] InnoDB: Plugin initialization aborted with error Generic error [ERROR] [003957] Failed to initialize DD Storage Engine [ERROR] [003634] Data Dictionary initialization failed. [ERROR] [003742] AbortingOK. Understandable. I should have done a backup before. But backup is for Girlies as well! Anyway this test system is not important. So I created a new instance from scratch which finally worked... Possibly just removing the redo log files as indicated would have helped as well.
Short term notice: Percona XtraDB Cluster training in English 7/8 February 2018 in Germany
FromDual offers short term a Percona XtraDB Cluster and MySQL Galera Cluster training (2 days) in English.
The training will take place in the Linuxhotel in Essen/Germany on February 7/8 2018.
There are already enough registrations so it is certain the training will take place. But there are still free places for some additional participants.
You can book online at the Linuxhotel.
Important: The Linuxhotel is nearly fully booked out. So accommodation is in nearby locations. The Linuxhotel will recommend you some locations.
The training is in English.
You can find the contents of this 2-day Percona XtraDB Cluster training here.
If you have any question please do not hesitate to contact us.
Taxonomy upgrade extras: galeraPercona XtraDB ClustertraininglinuxhotelShort term notice: Percona XtraDB Cluster training in English 7/8 February 2018 in Germany
FromDual offers short term a Percona XtraDB Cluster and MySQL Galera Cluster training (2 days) in English.
The training will take place in the Linuxhotel in Essen/Germany on February 7/8 2018.
There are already enough registrations so it is certain the training will take place. But there are still free places for some additional participants.
You can book online at the Linuxhotel.
Important: The Linuxhotel is nearly fully booked out. So accommodation is in nearby locations. The Linuxhotel will recommend you some locations.
The training is in English.
You can find the contents of this 2-day Percona XtraDB Cluster training here.
If you have any question please do not hesitate to contact us.
Taxonomy upgrade extras: galeraPercona XtraDB ClustertraininglinuxhotelAdvanced MySQL and MariaDB training in Cologne 2018
End of February, from February 26 to March 2 (5 days), FromDual offers an additional training for DBAs and DevOps: our most visited Advanced MySQL and MariaDB training.
This training is hold in the training facilities of the FromDual training partner GFU Cyrus GmbH in Cologne-Deutz (Germany).
There are already enough registrations so it is certain the training will take place. But there are still free places for at least 3 additional participants.
The training is in German.
You can find the training of this 5-day MySQL/MariaDB training here.
If you have any question please do not hesitate to contact us.
Taxonomy upgrade extras: trainingadvancedcologneAdvanced MySQL and MariaDB training in Cologne 2018
End of February, from February 26 to March 2 (5 days), FromDual offers an additional training for DBAs and DevOps: our most visited Advanced MySQL and MariaDB training.
This training is hold in the training facilities of the FromDual training partner GFU Cyrus GmbH in Cologne-Deutz (Germany).
There are already enough registrations so it is certain the training will take place. But there are still free places for at least 3 additional participants.
The training is in German.
You can find the training of this 5-day MySQL/MariaDB training here.
If you have any question please do not hesitate to contact us.
Taxonomy upgrade extras: trainingadvancedcologneOracle releases MySQL security vulnerability fixes 2018-01
As in every quarter of the year Oracle has released yesterday its recommendation for the MySQL security updates. This is called, in Oracle terminology, Critical Patch Update (CPU) Advisory.
This CPU is published for all Oracle products. But FromDual is only interested in MySQL related topics. So let us concentrate on those.
This time 25 fixes with a maximum score of 8.1 (out of 10.0) were published.
The following MySQL products are affected:
- MySQL Enterprise Monitor (3.3.6.3293 and before, 3.4.4.4226 and before, 4.0.0.5135 and before)
- MySQL Connector/Net (6.9.9. and before, 6.10.4 and before)
- MySQL Connector/ODBC (5.3.9. and before)
- MySQL Server (5.5.58 and before, 5.6.38 and before, 5.7.19 and before)
It is recommended to upgrade your MySQL products to close the security vulnerabilities.
FromDual upgrade decision aidBecause such security updates are published quarterly and some of our customers have dozens to hundreds of MySQL installations this would end up in a never ending story where you are continuously upgrading MySQL database servers and other products.
This led to idea to create an upgrade decision aid to decide if you have to upgrade to this CPU or not.
The following questions can be asked:
- How exposed is your database?
Databases can be located in various network segments. It is not recommended to expose databases directly to the internet. Databases are either installed in demilitarized zones (DMZ) with no direct access from the internet or in the companies private network (only company employees should be able to access the database) or even specialized secure networks (only a limited number of specific employees can access this network). - How critical are your data?
Some data are more interesting or critical, some data are less interesting or critical. Interesting data are: User data (user name and password), customer data (profiles, preferences, etc.), financial data (credit cards) and health care data (medical data). Systems containing such data are more critical than others. You can also ask: How sever is it if such data leak? - How broad is the user base able to access the database?
How many employees do you have in your company? How many contractors do you have in your company? How many employees have physical access to the database server? How good is the mood of those people?
How good are the user credentials to protect your database? Do you have shared passwords or no passwords at all? Do you have an account management (expiring old accounts, rotate passwords from time to time)?
How much do you trust your users? Do you trust all your employees? Do you trust only admins? Or do you not even trust your admins? - How severe are the security vulnerabilities?
You can define a threshold of severity of the vulnerabilities above you want to take actions. According to your criticality you can take actions for example as follows: Greater or equal than 7.5 if you have less critical data. Greater or equal than 6.0 if you have critical data. - Can the vulnerability be use from remote (over the network) and does it need a user authentication to exploit the vulnerability? What products (MySQL Enterprise Monitor, MySQL Server, MySQL Connectors) and what modules (Apache/Tomcat, .Net Connector, Partitioning, Stored Procedures, InnoDB, DDL, GIS, Optimizer, ODBC, Replication, DML, Performance Schema) are affected?
Depending on your readiness to take a risk you get now answers to decide if you have to take actions or not.
Some examples- Situation: Your database is exposed directly to the internet or you forgot to install some firewall rules to protect your MySQL port.
Analysis: You are probably affected by CVE-2018-2696 and CVE-2017-3737 (score 5.9 and 7.5). So you passed the threshold for non-critical data (7.5) and nearly passed the threshold for critical data (6.0). These vulnerabilities allow attacks over the network without user authentication.
Action: Immediate upgrade is recommended. Mid-term action: Install firewall rules to protect your MySQL to avoid access from remote and/or do not expose databases directly to the internet. - Situation: Your database is located in the intranet zone. You have slack user/password policies and you have many employees and also many contractors from foreign countries working on various projects. And you have very sensitive/interesting financial data stored in your database.
Analysis: Many people, not all of them are really trusted, have network access to the database. It is quite possible that passwords have been shared or people have passwords for projects they are not working for any more. You are affected by nearly all of the vulnerabilities (network).
Action: You should plan an upgrade soon. Mid-term action: Try to restrict access to the databases and implement some password policy rules (no shared passwords, password expiration, account locking etc.). - Situation: Your highly critical databases are located in a specially secured network and only applications, Linux admins and DBAs have access to this network. And you completely trust those people.
Analysis: Your threshold is 6.0 and (unauthenticated) attack over the network is not possible. There are some vulnerabilities of which you are affected but the database is only accessed by an application. So those vulnerabilities cannot be exploited easily.
Action: You possibly can ignore this CPU for the MySQL database this time. But you have a vulnerability in the .Net Connector (Connector/Net). If an attacker exploits the vulnerability on the Connector he possibly can get access to the data. So you have to upgrade the Connector of your application accessing the database.
If you follow the ideas of this aid you will probably have one or two upgrades a year. And this you should do anyway just to stay up to date...
See also Common Vulnerability Scoring System Version 3.0 Calculator.
Taxonomy upgrade extras: cpusecuritymysqlupgradeOracle releases MySQL security vulnerability fixes 2018-01
As in every quarter of the year Oracle has released yesterday its recommendation for the MySQL security updates. This is called, in Oracle terminology, Critical Patch Update (CPU) Advisory.
This CPU is published for all Oracle products. But FromDual is only interested in MySQL related topics. So let us concentrate on those.
This time 25 fixes with a maximum score of 8.1 (out of 10.0) were published.
The following MySQL products are affected:
- MySQL Enterprise Monitor (3.3.6.3293 and before, 3.4.4.4226 and before, 4.0.0.5135 and before)
- MySQL Connector/Net (6.9.9. and before, 6.10.4 and before)
- MySQL Connector/ODBC (5.3.9. and before)
- MySQL Server (5.5.58 and before, 5.6.38 and before, 5.7.19 and before)
It is recommended to upgrade your MySQL products to close the security vulnerabilities.
FromDual upgrade decision aidBecause such security updates are published quarterly and some of our customers have dozens to hundreds of MySQL installations this would end up in a never ending story where you are continuously upgrading MySQL database servers and other products.
This led to idea to create an upgrade decision aid to decide if you have to upgrade to this CPU or not.
The following questions can be asked:
- How exposed is your database?
Databases can be located in various network segments. It is not recommended to expose databases directly to the internet. Databases are either installed in demilitarized zones (DMZ) with no direct access from the internet or in the companies private network (only company employees should be able to access the database) or even specialized secure networks (only a limited number of specific employees can access this network). - How critical are your data?
Some data are more interesting or critical, some data are less interesting or critical. Interesting data are: User data (user name and password), customer data (profiles, preferences, etc.), financial data (credit cards) and health care data (medical data). Systems containing such data are more critical than others. You can also ask: How sever is it if such data leak? - How broad is the user base able to access the database?
How many employees do you have in your company? How many contractors do you have in your company? How many employees have physical access to the database server? How good is the mood of those people?
How good are the user credentials to protect your database? Do you have shared passwords or no passwords at all? Do you have an account management (expiring old accounts, rotate passwords from time to time)?
How much do you trust your users? Do you trust all your employees? Do you trust only admins? Or do you not even trust your admins? - How severe are the security vulnerabilities?
You can define a threshold of severity of the vulnerabilities above you want to take actions. According to your criticality you can take actions for example as follows: Greater or equal than 7.5 if you have less critical data. Greater or equal than 6.0 if you have critical data. - Can the vulnerability be use from remote (over the network) and does it need a user authentication to exploit the vulnerability? What products (MySQL Enterprise Monitor, MySQL Server, MySQL Connectors) and what modules (Apache/Tomcat, .Net Connector, Partitioning, Stored Procedures, InnoDB, DDL, GIS, Optimizer, ODBC, Replication, DML, Performance Schema) are affected?
Depending on your readiness to take a risk you get now answers to decide if you have to take actions or not.
Some examples- Situation: Your database is exposed directly to the internet or you forgot to install some firewall rules to protect your MySQL port.
Analysis: You are probably affected by CVE-2018-2696 and CVE-2017-3737 (score 5.9 and 7.5). So you passed the threshold for non-critical data (7.5) and nearly passed the threshold for critical data (6.0). These vulnerabilities allow attacks over the network without user authentication.
Action: Immediate upgrade is recommended. Mid-term action: Install firewall rules to protect your MySQL to avoid access from remote and/or do not expose databases directly to the internet. - Situation: Your database is located in the intranet zone. You have slack user/password policies and you have many employees and also many contractors from foreign countries working on various projects. And you have very sensitive/interesting financial data stored in your database.
Analysis: Many people, not all of them are really trusted, have network access to the database. It is quite possible that passwords have been shared or people have passwords for projects they are not working for any more. You are affected by nearly all of the vulnerabilities (network).
Action: You should plan an upgrade soon. Mid-term action: Try to restrict access to the databases and implement some password policy rules (no shared passwords, password expiration, account locking etc.). - Situation: Your highly critical databases are located in a specially secured network and only applications, Linux admins and DBAs have access to this network. And you completely trust those people.
Analysis: Your threshold is 6.0 and (unauthenticated) attack over the network is not possible. There are some vulnerabilities of which you are affected but the database is only accessed by an application. So those vulnerabilities cannot be exploited easily.
Action: You possibly can ignore this CPU for the MySQL database this time. But you have a vulnerability in the .Net Connector (Connector/Net). If an attacker exploits the vulnerability on the Connector he possibly can get access to the data. So you have to upgrade the Connector of your application accessing the database.
If you follow the ideas of this aid you will probably have one or two upgrades a year. And this you should do anyway just to stay up to date...
See also Common Vulnerability Scoring System Version 3.0 Calculator.
Taxonomy upgrade extras: cpusecuritymysqlupgradeGalera Cluster and Antivirus Scanner on Linux
Today we had to investigate in a very strange behaviour of IST and SST on a MariaDB Galera Cluster.
The symptom was, that some Galera Cluster nodes took a very long time to start. Up to 7 minutes. So the customer was concluding that the Galera Cluster node does an SST instead of an IST and was asking why the SST happens.
It have to be mentioned here, that the MariaDB error log is very confusing about whether it is an SST or an IST. So the customer was confused and concluded, that MariaDB Galera Cluster was doing an SST instead of IST.
Further confusing was that this behaviour was not consistently on all 3 nodes and not consistently on the 3 stages production, test and integration.
First we had to clear if the Galera node was doing an IST or an SST to exclude problems with Galera Cache or event Bugs in MariaDB Galera Cluster. For this we were running our famous insert_test.sh and did some node restarts with forcing SST and without.
As a Galera Cluster operator you must mandatorily be capable to determine which one of both State Transfers happens from the MariaDB error log:
MariaDB Error Log with IST on Joiner 2017-12-12 22:29:33 140158145914624 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 204013) 2017-12-12 22:29:33 140158426741504 [Note] WSREP: State transfer required: Group state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013 Local state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:201439 2017-12-12 22:29:33 140158426741504 [Note] WSREP: New cluster view: global state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013, view# 7: Primary, number of nodes: 3, my index: 2, protocol version 3 2017-12-12 22:29:33 140158426741504 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-12-12 22:29:33 140158116558592 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '127.0.0.1' --datadir '/home/mysql/database/magal-101-b/data/' --defaults-file '/home/mysql/database/magal-101-b/etc/my.cnf' --parent '16426' --binlog '/home/mysql/database/magal-101-b/binlog/laptop4_magal-101-b__binlog' ' 2017-12-12 22:29:33 140158426741504 [Note] WSREP: Prepared SST request: rsync|127.0.0.1:4444/rsync_sst 2017-12-12 22:29:33 140158426741504 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-12-12 22:29:33 140158426741504 [Note] WSREP: Assign initial position for certification: 204013, protocol version: 3 2017-12-12 22:29:33 140158203852544 [Note] WSREP: Service thread queue flushed. 2017-12-12 22:29:33 140158426741504 [Note] WSREP: IST receiver addr using tcp://127.0.0.1:5681 2017-12-12 22:29:33 140158426741504 [Note] WSREP: Prepared IST receiver, listening at: tcp://127.0.0.1:5681 2017-12-12 22:29:33 140158145914624 [Note] WSREP: Member 2.0 (Node B) requested state transfer from 'Node C'. Selected 1.0 (Node C)(SYNCED) as donor. 2017-12-12 22:29:33 140158145914624 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 204050) 2017-12-12 22:29:33 140158426741504 [Note] WSREP: Requesting state transfer: success, donor: 1 2017-12-12 22:29:33 140158426741504 [Note] WSREP: GCache history reset: old(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:0) -> new(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013) 2017-12-12 22:29:33 140158145914624 [Note] WSREP: 1.0 (Node C): State transfer to 2.0 (Node B) complete. 2017-12-12 22:29:33 140158145914624 [Note] WSREP: Member 1.0 (Node C) synced with group. WSREP_SST: [INFO] Joiner cleanup. rsync PID: 16663 (20171212 22:29:34.474) WSREP_SST: [INFO] Joiner cleanup done. (20171212 22:29:34.980) 2017-12-12 22:29:34 140158427056064 [Note] WSREP: SST complete, seqno: 201439 2017-12-12 22:29:35 140158427056064 [Note] WSREP: Signalling provider to continue. 2017-12-12 22:29:35 140158427056064 [Note] WSREP: SST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:201439 2017-12-12 22:29:35 140158426741504 [Note] WSREP: Receiving IST: 2574 writesets, seqnos 201439-204013 2017-12-12 22:29:35 140158426741504 [Note] WSREP: IST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013 2017-12-12 22:29:35 140158145914624 [Note] WSREP: 2.0 (Node B): State transfer from 1.0 (Node C) complete. 2017-12-12 22:29:35 140158145914624 [Note] WSREP: Shifting JOINER -> JOINED (TO: 204534) 2017-12-12 22:29:35 140158145914624 [Note] WSREP: Member 2.0 (Node B) synced with group. 2017-12-12 22:29:35 140158145914624 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 204535) 2017-12-12 22:29:35 140158426741504 [Note] WSREP: Synchronized with group, ready for connectionsMariaDB Error Log with SST on Joiner 2017-12-12 22:32:15 139817123833600 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 239097) 2017-12-12 22:32:15 139817401395968 [Note] WSREP: State transfer required: Group state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097 Local state: 00000000-0000-0000-0000-000000000000:-1 2017-12-12 22:32:15 139817401395968 [Note] WSREP: New cluster view: global state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097, view# 9: Primary, number of nodes: 3, my index: 2, protocol version 3 2017-12-12 22:32:15 139817401395968 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-12-12 22:32:15 139817094477568 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '127.0.0.1' --datadir '/home/mysql/database/magal-101-b/data/' --defaults-file '/home/mysql/database/magal-101-b/etc/my.cnf' --parent '25291' --binlog '/home/mysql/database/magal-101-b/binlog/laptop4_magal-101-b__binlog' ' 2017-12-12 22:32:15 139817401395968 [Note] WSREP: Prepared SST request: rsync|127.0.0.1:4444/rsync_sst 2017-12-12 22:32:15 139817401395968 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-12-12 22:32:15 139817401395968 [Note] WSREP: Assign initial position for certification: 239097, protocol version: 3 2017-12-12 22:32:15 139817178507008 [Note] WSREP: Service thread queue flushed. 2017-12-12 22:32:15 139817401395968 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (e2fbbca5-df26-11e7-8ee2-bb61f8ff3774): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable. 2017-12-12 22:32:15 139817123833600 [Note] WSREP: Member 2.0 (Node B) requested state transfer from 'Node C'. Selected 1.0 (Node C)(SYNCED) as donor. 2017-12-12 22:32:15 139817123833600 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 239136) 2017-12-12 22:32:15 139817401395968 [Note] WSREP: Requesting state transfer: success, donor: 1 2017-12-12 22:32:15 139817401395968 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097) 2017-12-12 22:32:17 139817123833600 [Note] WSREP: 1.0 (Node C): State transfer to 2.0 (Node B) complete. 2017-12-12 22:32:17 139817123833600 [Note] WSREP: Member 1.0 (Node C) synced with group. WSREP_SST: [INFO] Joiner cleanup. rsync PID: 25520 (20171212 22:32:17.846) WSREP_SST: [INFO] Joiner cleanup done. (20171212 22:32:18.352) 2017-12-12 22:32:18 139817401710528 [Note] WSREP: SST complete, seqno: 239153 2017-12-12 22:32:18 139817132226304 [Note] WSREP: (ebfd9e9c, 'tcp://127.0.0.1:5680') turning message relay requesting off 2017-12-12 22:32:22 139817401710528 [Note] WSREP: Signalling provider to continue. 2017-12-12 22:32:22 139817401710528 [Note] WSREP: SST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239153 2017-12-12 22:32:22 139817123833600 [Note] WSREP: 2.0 (Node B): State transfer from 1.0 (Node C) complete. 2017-12-12 22:32:22 139817123833600 [Note] WSREP: Shifting JOINER -> JOINED (TO: 239858) 2017-12-12 22:32:22 139817123833600 [Note] WSREP: Member 2.0 (Node B) synced with group. 2017-12-12 22:32:22 139817123833600 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 239866) 2017-12-12 22:32:22 139817401395968 [Note] WSREP: Synchronized with group, ready for connections
After we cleared that it really was an IST and that it was not a SST because of some other reasons the question rose: Why does an IST of only a few thousand transactions was taking 420 seconds. And this was not always the case...
So we were looking with top at the Donor and the Joiner during IST and we found that on the Donor node the Antivirus software was heavily using CPU (2 x 50%) and otherwise the system was doing nothing for a while and then suddenly started to transfer data over the network (possibly IST?).
Later we found, that the MariaDB datadir (/var/lib/mysql) was not excluded from the Antivirus software. And finally it looks like the Antivirus software was not properly configured by its Master server because the Antivirus software agent was from a cloned VM and not reinitialized. So the Antivirus Master server seems to be confused because there are 2 Antivirus software agents with the same ID.
Another very surprising situation which we did not expect was, that IST was much heavier influenced by the Antivirus software than SST. SST finished in a few seconds while IST took 420 seconds.
Conclusion: Be careful when using Antivirus software in combination with MariaDB Galera Cluster databases and exclude at least all database directories from virus scanning. If you want to be sure to avoid side effects (noisy neighbours) disable the Antivirus software on the database server at all and make sure by other means, that no virus is reaching your precious MariaDB Galera Cluster...
Taxonomy upgrade extras: Galera ClusterVirusAntivirusISTSSTnoisy neighboursGalera Cluster and Antivirus Scanner on Linux
Today we had to investigate in a very strange behaviour of IST and SST on a MariaDB Galera Cluster.
The symptom was, that some Galera Cluster nodes took a very long time to start. Up to 7 minutes. So the customer was concluding that the Galera Cluster node does an SST instead of an IST and was asking why the SST happens.
It have to be mentioned here, that the MariaDB error log is very confusing about whether it is an SST or an IST. So the customer was confused and concluded, that MariaDB Galera Cluster was doing an SST instead of IST.
Further confusing was that this behaviour was not consistently on all 3 nodes and not consistently on the 3 stages production, test and integration.
First we had to clear if the Galera node was doing an IST or an SST to exclude problems with Galera Cache or event Bugs in MariaDB Galera Cluster. For this we were running our famous insert_test.sh and did some node restarts with forcing SST and without.
As a Galera Cluster operator you must mandatorily be capable to determine which one of both State Transfers happens from the MariaDB error log:
MariaDB Error Log with IST on Joiner 2017-12-12 22:29:33 140158145914624 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 204013) 2017-12-12 22:29:33 140158426741504 [Note] WSREP: State transfer required: Group state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013 Local state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:201439 2017-12-12 22:29:33 140158426741504 [Note] WSREP: New cluster view: global state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013, view# 7: Primary, number of nodes: 3, my index: 2, protocol version 3 2017-12-12 22:29:33 140158426741504 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-12-12 22:29:33 140158116558592 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '127.0.0.1' --datadir '/home/mysql/database/magal-101-b/data/' --defaults-file '/home/mysql/database/magal-101-b/etc/my.cnf' --parent '16426' --binlog '/home/mysql/database/magal-101-b/binlog/laptop4_magal-101-b__binlog' ' 2017-12-12 22:29:33 140158426741504 [Note] WSREP: Prepared SST request: rsync|127.0.0.1:4444/rsync_sst 2017-12-12 22:29:33 140158426741504 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-12-12 22:29:33 140158426741504 [Note] WSREP: Assign initial position for certification: 204013, protocol version: 3 2017-12-12 22:29:33 140158203852544 [Note] WSREP: Service thread queue flushed. 2017-12-12 22:29:33 140158426741504 [Note] WSREP: IST receiver addr using tcp://127.0.0.1:5681 2017-12-12 22:29:33 140158426741504 [Note] WSREP: Prepared IST receiver, listening at: tcp://127.0.0.1:5681 2017-12-12 22:29:33 140158145914624 [Note] WSREP: Member 2.0 (Node B) requested state transfer from 'Node C'. Selected 1.0 (Node C)(SYNCED) as donor. 2017-12-12 22:29:33 140158145914624 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 204050) 2017-12-12 22:29:33 140158426741504 [Note] WSREP: Requesting state transfer: success, donor: 1 2017-12-12 22:29:33 140158426741504 [Note] WSREP: GCache history reset: old(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:0) -> new(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013) 2017-12-12 22:29:33 140158145914624 [Note] WSREP: 1.0 (Node C): State transfer to 2.0 (Node B) complete. 2017-12-12 22:29:33 140158145914624 [Note] WSREP: Member 1.0 (Node C) synced with group. WSREP_SST: [INFO] Joiner cleanup. rsync PID: 16663 (20171212 22:29:34.474) WSREP_SST: [INFO] Joiner cleanup done. (20171212 22:29:34.980) 2017-12-12 22:29:34 140158427056064 [Note] WSREP: SST complete, seqno: 201439 2017-12-12 22:29:35 140158427056064 [Note] WSREP: Signalling provider to continue. 2017-12-12 22:29:35 140158427056064 [Note] WSREP: SST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:201439 2017-12-12 22:29:35 140158426741504 [Note] WSREP: Receiving IST: 2574 writesets, seqnos 201439-204013 2017-12-12 22:29:35 140158426741504 [Note] WSREP: IST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013 2017-12-12 22:29:35 140158145914624 [Note] WSREP: 2.0 (Node B): State transfer from 1.0 (Node C) complete. 2017-12-12 22:29:35 140158145914624 [Note] WSREP: Shifting JOINER -> JOINED (TO: 204534) 2017-12-12 22:29:35 140158145914624 [Note] WSREP: Member 2.0 (Node B) synced with group. 2017-12-12 22:29:35 140158145914624 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 204535) 2017-12-12 22:29:35 140158426741504 [Note] WSREP: Synchronized with group, ready for connectionsMariaDB Error Log with SST on Joiner 2017-12-12 22:32:15 139817123833600 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 239097) 2017-12-12 22:32:15 139817401395968 [Note] WSREP: State transfer required: Group state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097 Local state: 00000000-0000-0000-0000-000000000000:-1 2017-12-12 22:32:15 139817401395968 [Note] WSREP: New cluster view: global state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097, view# 9: Primary, number of nodes: 3, my index: 2, protocol version 3 2017-12-12 22:32:15 139817401395968 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-12-12 22:32:15 139817094477568 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '127.0.0.1' --datadir '/home/mysql/database/magal-101-b/data/' --defaults-file '/home/mysql/database/magal-101-b/etc/my.cnf' --parent '25291' --binlog '/home/mysql/database/magal-101-b/binlog/laptop4_magal-101-b__binlog' ' 2017-12-12 22:32:15 139817401395968 [Note] WSREP: Prepared SST request: rsync|127.0.0.1:4444/rsync_sst 2017-12-12 22:32:15 139817401395968 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-12-12 22:32:15 139817401395968 [Note] WSREP: Assign initial position for certification: 239097, protocol version: 3 2017-12-12 22:32:15 139817178507008 [Note] WSREP: Service thread queue flushed. 2017-12-12 22:32:15 139817401395968 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (e2fbbca5-df26-11e7-8ee2-bb61f8ff3774): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable. 2017-12-12 22:32:15 139817123833600 [Note] WSREP: Member 2.0 (Node B) requested state transfer from 'Node C'. Selected 1.0 (Node C)(SYNCED) as donor. 2017-12-12 22:32:15 139817123833600 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 239136) 2017-12-12 22:32:15 139817401395968 [Note] WSREP: Requesting state transfer: success, donor: 1 2017-12-12 22:32:15 139817401395968 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097) 2017-12-12 22:32:17 139817123833600 [Note] WSREP: 1.0 (Node C): State transfer to 2.0 (Node B) complete. 2017-12-12 22:32:17 139817123833600 [Note] WSREP: Member 1.0 (Node C) synced with group. WSREP_SST: [INFO] Joiner cleanup. rsync PID: 25520 (20171212 22:32:17.846) WSREP_SST: [INFO] Joiner cleanup done. (20171212 22:32:18.352) 2017-12-12 22:32:18 139817401710528 [Note] WSREP: SST complete, seqno: 239153 2017-12-12 22:32:18 139817132226304 [Note] WSREP: (ebfd9e9c, 'tcp://127.0.0.1:5680') turning message relay requesting off 2017-12-12 22:32:22 139817401710528 [Note] WSREP: Signalling provider to continue. 2017-12-12 22:32:22 139817401710528 [Note] WSREP: SST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239153 2017-12-12 22:32:22 139817123833600 [Note] WSREP: 2.0 (Node B): State transfer from 1.0 (Node C) complete. 2017-12-12 22:32:22 139817123833600 [Note] WSREP: Shifting JOINER -> JOINED (TO: 239858) 2017-12-12 22:32:22 139817123833600 [Note] WSREP: Member 2.0 (Node B) synced with group. 2017-12-12 22:32:22 139817123833600 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 239866) 2017-12-12 22:32:22 139817401395968 [Note] WSREP: Synchronized with group, ready for connections
After we cleared that it really was an IST and that it was not a SST because of some other reasons the question rose: Why does an IST of only a few thousand transactions was taking 420 seconds. And this was not always the case...
So we were looking with top at the Donor and the Joiner during IST and we found that on the Donor node the Antivirus software was heavily using CPU (2 x 50%) and otherwise the system was doing nothing for a while and then suddenly started to transfer data over the network (possibly IST?).
Later we found, that the MariaDB datadir (/var/lib/mysql) was not excluded from the Antivirus software. And finally it looks like the Antivirus software was not properly configured by its Master server because the Antivirus software agent was from a cloned VM and not reinitialized. So the Antivirus Master server seems to be confused because there are 2 Antivirus software agents with the same ID.
Another very surprising situation which we did not expect was, that IST was much heavier influenced by the Antivirus software than SST. SST finished in a few seconds while IST took 420 seconds.
Conclusion: Be careful when using Antivirus software in combination with MariaDB Galera Cluster databases and exclude at least all database directories from virus scanning. If you want to be sure to avoid side effects (noisy neighbours) disable the Antivirus software on the database server at all and make sure by other means, that no virus is reaching your precious MariaDB Galera Cluster...
Taxonomy upgrade extras: Galera ClusterVirusAntivirusISTSSTnoisy neighboursFirst Docker steps with MySQL and MariaDB
The Docker version of the distributions are often quite old. On Ubuntu 16.04 for example:
shell> docker --version Docker version 1.13.1, build 092cba3But the current docker version is 17.09.0-ce (2017-09-26). It seems like they have switched from the old version schema x.y.z to the new year.month.version version schema in February/March 2017.
Install Docker CE RepositoryAdd the Docker's official PGP key:
shell> curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - OKAdd the Docker repository:
shell> echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable" > /etc/apt/sources.list.d/docker.list shell> apt-get updateInstall or upgrade Docker:
shell> apt-get install docker-ce shell> docker --version Docker version 17.09.0-ce, build afdb6d4To test your Docker installation run:
shell> docker run --rm hello-worldAdd Docker containers for MariaDB, MySQL and MySQL Enterprise Edition
First we want to see what Docker containers are available:
shell> docker search mysql --no-trunc --filter=stars=100 NAME DESCRIPTION STARS OFFICIAL AUTOMATED mysql MySQL is a widely used, open-source relational database management system (RDBMS). 5273 [OK] mariadb MariaDB is a community-developed fork of MySQL intended to remain free under the GNU GPL. 1634 [OK] mysql/mysql-server Optimized MySQL Server Docker images. Created, maintained and supported by the MySQL team at Oracle 368 [OK] percona Percona Server is a fork of the MySQL relational database management system created by Percona. 303 [OK] ...OK. It seems like MySQL Server Enterprise Edition is missing. So we have to create an account on Docker Store and get the MySQL Server Enterprise Edition Image from there:
shell> docker login --username=fromdual Password: Login SucceededUnfortunately one can still not see MySQL Server Enterprise Edition.
But we can try anyway:
shell> docker pull store/oracle/mysql-enterprise-server:5.7 shell> docker logout shell> docker pull mysql shell> docker pull mariadb shell> docker pull mysql/mysql-serverTo see what is going on on your local Docker registry you can type:
shell> docker images REPOSITORY TAG IMAGE ID CREATED SIZE mariadb latest abcee1d29aac 8 days ago 396MB mysql latest 5709795eeffa 2 weeks ago 408MB mysql/mysql-server latest a3ee341faefb 5 weeks ago 246MB store/oracle/mysql-enterprise-server 5.7 41bf2fa0b4a1 4 months ago 244MB hello-world latest 48b5124b2768 10 months ago 1.84kBI personally do not like that all those images which are tagged with latest because I want a clear control over what version is used. MariaDB and MySQL community server have implemented this quite nicely but not MySQL Enterprise Edition:
shell> docker pull mariadb:10.0 shell> docker pull mariadb:10.0.23 shell> docker pull mysql:8.0 shell> docker pull mysql:8.0.3 docker images | sort REPOSITORY TAG IMAGE ID CREATED SIZE hello-world latest 48b5124b2768 10 months ago 1.84kB mariadb 10.0.23 93631b528e67 21 months ago 305MB mariadb 10.0 eecd58425049 8 days ago 337MB mariadb latest abcee1d29aac 8 days ago 396MB mysql 8.0.3 e691422324d8 2 weeks ago 343MB mysql 8.0 e691422324d8 2 weeks ago 343MB mysql latest 5709795eeffa 2 weeks ago 408MB mysql/mysql-server latest a3ee341faefb 5 weeks ago 246MB store/oracle/mysql-enterprise-server 5.7 41bf2fa0b4a1 4 months ago 244MBRun a MariaDB server container
Start a new Docker container from the MariaDB image by running:
shell> CONTAINER_NAME=mariadb shell> CONTAINER_IMAGE=mariadb shell> TAG=latest shell> MYSQL_ROOT_PASSWORD=Secret-123 shell> MYSQL_ROOT_USER=root shell> docker run \ --name=${CONTAINER_NAME} \ --detach \ --env=MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} \ ${CONTAINER_IMAGE}:${TAG} shell> docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 60d7b6de7ed1 mariadb:latest "docker-entrypoint..." 24 seconds ago Up 23 seconds 3306/tcp mariadb shell> docker logs ${CONTAINER_NAME} shell> docker exec \ --interactive \ --tty \ ${CONTAINER_NAME} \ mysql --user=${MYSQL_ROOT_USER} --password=${MYSQL_ROOT_PASSWORD} --execute="status" shell> docker image tag mariadb:latest mariadb:10.2.10 shell> docker exec --interactive \ --tty \ ${CONTAINER_NAME} \ bash shell> docker stop ${CONTAINER_NAME} shell> docker rm ${CONTAINER_NAME}Run a MySQL Community server container shell> CONTAINER_NAME=mysql shell> CONTAINER_IMAGE=mysql/mysql-server shell> TAG=latest shell> MYSQL_ROOT_PASSWORD=Secret-123 shell> docker run \ --name=${CONTAINER_NAME} \ --detach \ --env=MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} \ ${CONTAINER_IMAGE}:${TAG} shell> docker stop ${CONTAINER_NAME} shell> docker rm ${CONTAINER_NAME}
Run a MySQL Server Enterprise Edition container shell> CONTAINER_NAME=mysql-ee shell> CONTAINER_IMAGE=store/oracle/mysql-enterprise-server shell> TAG=5.7 shell> MYSQL_ROOT_PASSWORD=Secret-123 shell> docker run \ --name=${CONTAINER_NAME} \ --detach \ --env=MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} \ ${CONTAINER_IMAGE}:${TAG} shell> docker ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0cb4e6a8a621 store/oracle/mysql-enterprise-server:5.7 "/entrypoint.sh my..." 37 seconds ago Up 36 seconds (healthy) 3306/tcp, 33060/tcp mysql-ee 1832b98da6ef mysql:latest "docker-entrypoint..." 6 minutes ago Up 6 minutes 3306/tcp mysql 60d7b6de7ed1 mariadb:latest "docker-entrypoint..." 21 minutes ago Up 21 minutes 3306/tcp mariadb
All my 3 docker containers are currently running as root:
shell> ps -ef | grep docker root 13177 1 20:20 ? 00:00:44 /usr/bin/dockerd -H fd:// root 13186 13177 20:20 ? 00:00:04 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc root 24004 13186 21:41 ? 00:00:00 docker-containerd-shim 60d7b6de7ed1ff62b67e66c6effce0094fd60e9565ede65fa34e188b636c54ec /var/run/docker/libcontainerd/60d7b6de7ed1ff62b67e66c6effce0094fd60e9565ede65fa34e188b636c54ec docker-runc root 26593 13186 21:56 ? 00:00:00 docker-containerd-shim 1832b98da6ef7459c33181e9b9ddd89a4136c3b2676335bcbbb533389cbf6219 /var/run/docker/libcontainerd/1832b98da6ef7459c33181e9b9ddd89a4136c3b2676335bcbbb533389cbf6219 docker-runc root 27714 13186 22:02 ? 00:00:00 docker-containerd-shim 0cb4e6a8a62103b66164ccddd028217bb4012d8a6aad1f62d3ed6ae71e1a38b4 /var/run/docker/libcontainerd/0cb4e6a8a62103b66164ccddd028217bb4012d8a6aad1f62d3ed6ae71e1a38b4 docker-runcBut the user running the process IN the container is not root:
shell> docker exec \ --interactive \ --tty \ ${CONTAINER_NAME} \ grep ^Uid /proc/1/status Uid: 27 27 27 27 shell> docker exec \ --interactive \ --tty \ ${CONTAINER_NAME} \ bash -c "id 27" uid=27(mysql) gid=27(mysql) groups=27(mysql)Run a Docker container from mysql user shell> id uid=1001(mysql) gid=1001(mysql) groups=1001(mysql) shell> docker images Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.32/images/json: dial unix /var/run/docker.sock: connect: permission denied shell> sudo adduser mysql docker Adding user `mysql' to group `docker' ... Adding user mysql to group docker Done.
Taxonomy upgrade extras: dockermysqlmariadb
First Docker steps with MySQL and MariaDB
The Docker version of the distributions are often quite old. On Ubuntu 16.04 for example:
shell> docker --version Docker version 1.13.1, build 092cba3But the current docker version is 17.09.0-ce (2017-09-26). It seems like they have switched from the old version schema x.y.z to the new year.month.version version schema in February/March 2017.
Install Docker CE RepositoryAdd the Docker's official PGP key:
shell> curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - OKAdd the Docker repository:
shell> echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable" > /etc/apt/sources.list.d/docker.list shell> apt-get updateInstall or upgrade Docker:
shell> apt-get install docker-ce shell> docker --version Docker version 17.09.0-ce, build afdb6d4To test your Docker installation run:
shell> docker run --rm hello-worldAdd Docker containers for MariaDB, MySQL and MySQL Enterprise Edition
First we want to see what Docker containers are available:
shell> docker search mysql --no-trunc --filter=stars=100 NAME DESCRIPTION STARS OFFICIAL AUTOMATED mysql MySQL is a widely used, open-source relational database management system (RDBMS). 5273 [OK] mariadb MariaDB is a community-developed fork of MySQL intended to remain free under the GNU GPL. 1634 [OK] mysql/mysql-server Optimized MySQL Server Docker images. Created, maintained and supported by the MySQL team at Oracle 368 [OK] percona Percona Server is a fork of the MySQL relational database management system created by Percona. 303 [OK] ...OK. It seems like MySQL Server Enterprise Edition is missing. So we have to create an account on Docker Store and get the MySQL Server Enterprise Edition Image from there:
shell> docker login --username=fromdual Password: Login SucceededUnfortunately one can still not see MySQL Server Enterprise Edition.
But we can try anyway:
shell> docker pull store/oracle/mysql-enterprise-server:5.7 shell> docker logout shell> docker pull mysql shell> docker pull mariadb shell> docker pull mysql/mysql-serverTo see what is going on on your local Docker registry you can type:
shell> docker images REPOSITORY TAG IMAGE ID CREATED SIZE mariadb latest abcee1d29aac 8 days ago 396MB mysql latest 5709795eeffa 2 weeks ago 408MB mysql/mysql-server latest a3ee341faefb 5 weeks ago 246MB store/oracle/mysql-enterprise-server 5.7 41bf2fa0b4a1 4 months ago 244MB hello-world latest 48b5124b2768 10 months ago 1.84kBI personally do not like that all those images which are tagged with latest because I want a clear control over what version is used. MariaDB and MySQL community server have implemented this quite nicely but not MySQL Enterprise Edition:
shell> docker pull mariadb:10.0 shell> docker pull mariadb:10.0.23 shell> docker pull mysql:8.0 shell> docker pull mysql:8.0.3 docker images | sort REPOSITORY TAG IMAGE ID CREATED SIZE hello-world latest 48b5124b2768 10 months ago 1.84kB mariadb 10.0.23 93631b528e67 21 months ago 305MB mariadb 10.0 eecd58425049 8 days ago 337MB mariadb latest abcee1d29aac 8 days ago 396MB mysql 8.0.3 e691422324d8 2 weeks ago 343MB mysql 8.0 e691422324d8 2 weeks ago 343MB mysql latest 5709795eeffa 2 weeks ago 408MB mysql/mysql-server latest a3ee341faefb 5 weeks ago 246MB store/oracle/mysql-enterprise-server 5.7 41bf2fa0b4a1 4 months ago 244MBRun a MariaDB server container
Start a new Docker container from the MariaDB image by running:
shell> CONTAINER_NAME=mariadb shell> CONTAINER_IMAGE=mariadb shell> TAG=latest shell> MYSQL_ROOT_PASSWORD=Secret-123 shell> MYSQL_ROOT_USER=root shell> docker run \ --name=${CONTAINER_NAME} \ --detach \ --env=MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} \ ${CONTAINER_IMAGE}:${TAG} shell> docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 60d7b6de7ed1 mariadb:latest "docker-entrypoint..." 24 seconds ago Up 23 seconds 3306/tcp mariadb shell> docker logs ${CONTAINER_NAME} shell> docker exec \ --interactive \ --tty \ ${CONTAINER_NAME} \ mysql --user=${MYSQL_ROOT_USER} --password=${MYSQL_ROOT_PASSWORD} --execute="status" shell> docker image tag mariadb:latest mariadb:10.2.10 shell> docker exec --interactive \ --tty \ ${CONTAINER_NAME} \ bash shell> docker stop ${CONTAINER_NAME} shell> docker rm ${CONTAINER_NAME}Run a MySQL Community server container shell> CONTAINER_NAME=mysql shell> CONTAINER_IMAGE=mysql/mysql-server shell> TAG=latest shell> MYSQL_ROOT_PASSWORD=Secret-123 shell> docker run \ --name=${CONTAINER_NAME} \ --detach \ --env=MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} \ ${CONTAINER_IMAGE}:${TAG} shell> docker stop ${CONTAINER_NAME} shell> docker rm ${CONTAINER_NAME}
Run a MySQL Server Enterprise Edition container shell> CONTAINER_NAME=mysql-ee shell> CONTAINER_IMAGE=store/oracle/mysql-enterprise-server shell> TAG=5.7 shell> MYSQL_ROOT_PASSWORD=Secret-123 shell> docker run \ --name=${CONTAINER_NAME} \ --detach \ --env=MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} \ ${CONTAINER_IMAGE}:${TAG} shell> docker ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0cb4e6a8a621 store/oracle/mysql-enterprise-server:5.7 "/entrypoint.sh my..." 37 seconds ago Up 36 seconds (healthy) 3306/tcp, 33060/tcp mysql-ee 1832b98da6ef mysql:latest "docker-entrypoint..." 6 minutes ago Up 6 minutes 3306/tcp mysql 60d7b6de7ed1 mariadb:latest "docker-entrypoint..." 21 minutes ago Up 21 minutes 3306/tcp mariadb
All my 3 docker containers are currently running as root:
shell> ps -ef | grep docker root 13177 1 20:20 ? 00:00:44 /usr/bin/dockerd -H fd:// root 13186 13177 20:20 ? 00:00:04 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc root 24004 13186 21:41 ? 00:00:00 docker-containerd-shim 60d7b6de7ed1ff62b67e66c6effce0094fd60e9565ede65fa34e188b636c54ec /var/run/docker/libcontainerd/60d7b6de7ed1ff62b67e66c6effce0094fd60e9565ede65fa34e188b636c54ec docker-runc root 26593 13186 21:56 ? 00:00:00 docker-containerd-shim 1832b98da6ef7459c33181e9b9ddd89a4136c3b2676335bcbbb533389cbf6219 /var/run/docker/libcontainerd/1832b98da6ef7459c33181e9b9ddd89a4136c3b2676335bcbbb533389cbf6219 docker-runc root 27714 13186 22:02 ? 00:00:00 docker-containerd-shim 0cb4e6a8a62103b66164ccddd028217bb4012d8a6aad1f62d3ed6ae71e1a38b4 /var/run/docker/libcontainerd/0cb4e6a8a62103b66164ccddd028217bb4012d8a6aad1f62d3ed6ae71e1a38b4 docker-runcBut the user running the process IN the container is not root:
shell> docker exec \ --interactive \ --tty \ ${CONTAINER_NAME} \ grep ^Uid /proc/1/status Uid: 27 27 27 27 shell> docker exec \ --interactive \ --tty \ ${CONTAINER_NAME} \ bash -c "id 27" uid=27(mysql) gid=27(mysql) groups=27(mysql)Run a Docker container from mysql user shell> id uid=1001(mysql) gid=1001(mysql) groups=1001(mysql) shell> docker images Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.32/images/json: dial unix /var/run/docker.sock: connect: permission denied shell> sudo adduser mysql docker Adding user `mysql' to group `docker' ... Adding user mysql to group docker Done.
Taxonomy upgrade extras: dockermysqlmariadb
MariaDB master/master GTID based replication with keepalived VIP
Some of our customers still want to have old-style MariaDB master/master replication clusters. Time goes by, new technologies appear but some old stuff still remains.
The main problem in a master/master replication set-up is to make the service highly available for the application (applications typically cannot deal with more than one point-of-contact). This can be achieved with a load balancer (HAproxy, Galera Load Balancer (GLB), ProxySQL or MaxScale) in front of the MariaDB master/master replication cluster. But the load balancer by it-self should also become highly available. And this is typically achieved by a virtual IP (VIP) in front of one of the load balancers. To make operations of the VIP more handy the VIP is controlled by a service like keepalived or corosync.
Because I like simple solutions (I am a strong believer in the KISS principle) I thought about avoiding the load balancer in the middle and attach the VIP directly to the master/master replication servers and let them to be controlled by keepalived as well.
Important: A master/master replication set-up is vulnerable to split-brain situations. Neither keepalived nor the master/master replication helps you to avoid conflicts and in any way to prevent this situation. If you are sensitive to split-brain situations you should look for Galera Cluster. Keepalived is made for stateless services like load balancers, etc. but not databases.
Set-up a MariaDB master/master replication clusterBecause most of the Linux distributions have a bit old versions of software delivered we use the MariaDB 10.2 repository from the MariaDB website:
# # /etc/yum.repos.d/MariaDB-10.2.repo # # MariaDB 10.2 CentOS repository list - created 2017-11-08 20:32 UTC # http://downloads.mariadb.org/mariadb/repositories/ # [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.2/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1Then we install the MariaDB server and start it:
shell> yum makecache shell> yum install MariaDB-server MariaDB-client shell> systemctl start mariadb shell> systemctl enabled mariadbFor the MariaDB master/master replication set-up configuration we use the following parameters:
# # /etc/my.cnf # [mysqld] server_id = 1 # 2 on the other node log_bin = binlog-m1 # binlog-m2 on the other node log_slave_updates = 1 gtid_domain_id = 1 # 2 on the other node gtid_strict_mode = On auto_increment_increment = 2 auto_increment_offset = 1 # 2 on the other node read_only = On # super_read_only for MySQL 5.7 and newerThen we close the master/master replication ring according to: Starting with empty server.
mariadb> SET GLOBAL gtid_slave_pos = ""; mariadb> CHANGE MASTER TO master_host="192.168.56.101", master_user="replication" , master_use_gtid=current_pos; mariadb> START SLAVE;Installing keepalived
Literature:
The next step is to install and configure keepalived. This can be done as follows:
shell> yum install keepalived shell> systemctl enable keepalivedImportant: In my tests I got crashes and core dumps with keepalived which disappeared after a full upgrade of CentOS 7.
Configuring keepalivedThe most important part is the keepalived configuration file:
# # /etc/keepalived/keepalived.conf # global_defs { notification_email { root@localhost dba@example.com } notification_email_from root@master1 # master2 on the other node smtp_server localhost 25 router_id MARIADB_MM enable_script_security } # Health checks vrrp_script chk_mysql { script "/usr/sbin/pidof mysqld" weight 2 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 3 # script considered failed after ... seconds fall 3 # number of failures for K.O. rise 1 # number of success for OK } vrrp_script chk_failover { script "/etc/keepalived/chk_failover.sh" weight -4 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 1 # script considered failed after ... seconds fall 1 # number of failures for K.O. rise 1 # number of success for OK } # Main configuration vrrp_instance VI_MM_VIP { state MASTER # BACKUP on the other side interface enp0s9 # private heartbeat interface priority 100 # Higher means: elected first (BACKUP: 99) virtual_router_id 42 # ID for all nodes of Cluster group debug 0 # 0 .. 4, seems not to work? unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.99/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql # If File /etc/keepalived/failover is touched failover is triggered # Similar can be reached when priority is lowered followed by a reload chk_failover } # When node becomes MASTER this script is triggered notify_master "/etc/keepalived/keepalived_master.sh --user=root --password= --wait=yes --variable=read_only" # When node becomes SLAVE this script is triggered notify_backup "/etc/keepalived/keepalived_backup.sh --user=root --password= --kill=yes --variable=read_only" # Possibly fault and stop should also call keepalived_backup.sh to be on the safe side... notify_fault "/etc/keepalived/keepalived_fault.sh arg1 arg2" notify_stop "/etc/keepalived/keepalived_stop.sh arg1 arg2" # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit }With the command:
shell> systemctl restart keepalivedthe service is started and/or the configuration is reloaded.
The scripts we used in the configuration file are the following:
chk_failover.sh keepalived_backup.sh keepalived_fault.sh keepalived_master.sh keepalived_notify.sh keepalived_stop.sh#!/bin/bash # # /etc/keepalived/keepalived_notify.sh # TYPE=${1} NAME=${2} STATE=${3} PRIORITY=${4} TS=$(date '+%Y-%m-%d_%H:%M:%S') LOG=/etc/keepalived/keepalived_notify.log echo $TS $0 $@ >>${LOG}
#!/bin/bash # # /etc/keepalived/chk_failover.sh # /usr/bin/stat /etc/keepalived/failover 2>/dev/null 1>&2 if [ ${?} -eq 0 ] ; then exit 1 else exit 0 fi
To make MariaDB master/master replication more robust against replication problems we took the following (configurable) actions on the database side:
Getting the MASTER role:
- Waiting for catch-up replication
- Make the MariaDB instance read/write
Getting the BACKUP role:
- Make the MariaDB instance read-only
- Kill all open connections
Testing scenarios
The following scenarios where tested under load (insert_test.sh):
- Intentional fail-over for maintenance: shell> touch /etc/keepalived/failover shell> rm -f /etc/keepalived/failover
- Stopping keepalived: shell> systemctl stop keepalived shell> systemctl start keepalived
- Stopping MariaDB node: shell> systemctl stop mariadb shell> systemctl start mariadb
- Reboot server: shell> reboot
- Simulation of split-brain: shell> ip link set enp0s9 down shell> ip link set enp0s9 up
Problems
Problems we faced during set-up and testing were:
- SElinux/AppArmor
- Firewall
Keepalived controlling 2 virtual IPs
A second scenario we wanted to build is a MariaDB master/master GTID based replication cluster with 2 VIP addresses. This is to achieve either a read-only VIP and a read/write VIP or to have half of the load on one master and half of the load on the other master:
For this scenario we used the same scripts but a slightly different keepalived configuration:
# # /etc/keepalived/keepalived.conf # global_defs { notification_email { root@localhost dba@example.com } notification_email_from root@master1 # master2 on the other node smtp_server localhost 25 router_id MARIADB_MM enable_script_security } # Health checks vrrp_script chk_mysql { script "/usr/sbin/pidof mysqld" weight 2 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 3 # script considered failed after ... seconds fall 3 # number of failures for K.O. rise 1 # number of success for OK } vrrp_script chk_failover { script "/etc/keepalived/chk_failover.sh" weight -4 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 1 # script considered failed after ... seconds fall 1 # number of failures for K.O. rise 1 # number of success for OK } # Main configuration vrrp_instance VI_MM_VIP1 { state MASTER # BACKUP on the other side interface enp0s9 # private heartbeat interface priority 100 # Higher means: elected first (BACKUP: 99) virtual_router_id 42 # ID for all nodes of Cluster group unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.99/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql chk_failover } # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit } vrrp_instance VI_MM_VIP2 { state BACKUP # MASTER on the other side interface enp0s9 # private heartbeat interface priority 99 # Higher means: elected first (MASTER: 100) virtual_router_id 43 # ID for all nodes of Cluster group unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.98/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql chk_failover } # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit }Taxonomy upgrade extras: mariadbmaster-masterkeepalivedVIPreplicationGTID
MariaDB master/master GTID based replication with keepalived VIP
Some of our customers still want to have old-style MariaDB master/master replication clusters. Time goes by, new technologies appear but some old stuff still remains.
The main problem in a master/master replication set-up is to make the service highly available for the application (applications typically cannot deal with more than one point-of-contact). This can be achieved with a load balancer (HAproxy, Galera Load Balancer (GLB), ProxySQL or MaxScale) in front of the MariaDB master/master replication cluster. But the load balancer by it-self should also become highly available. And this is typically achieved by a virtual IP (VIP) in front of one of the load balancers. To make operations of the VIP more handy the VIP is controlled by a service like keepalived or corosync.
Because I like simple solutions (I am a strong believer in the KISS principle) I thought about avoiding the load balancer in the middle and attach the VIP directly to the master/master replication servers and let them to be controlled by keepalived as well.
Important: A master/master replication set-up is vulnerable to split-brain situations. Neither keepalived nor the master/master replication helps you to avoid conflicts and in any way to prevent this situation. If you are sensitive to split-brain situations you should look for Galera Cluster. Keepalived is made for stateless services like load balancers, etc. but not databases.
Set-up a MariaDB master/master replication clusterBecause most of the Linux distributions have a bit old versions of software delivered we use the MariaDB 10.2 repository from the MariaDB website:
# # /etc/yum.repos.d/MariaDB-10.2.repo # # MariaDB 10.2 CentOS repository list - created 2017-11-08 20:32 UTC # http://downloads.mariadb.org/mariadb/repositories/ # [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.2/centos7-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1Then we install the MariaDB server and start it:
shell> yum makecache shell> yum install MariaDB-server MariaDB-client shell> systemctl start mariadb shell> systemctl enabled mariadbFor the MariaDB master/master replication set-up configuration we use the following parameters:
# # /etc/my.cnf # [mysqld] server_id = 1 # 2 on the other node log_bin = binlog-m1 # binlog-m2 on the other node log_slave_updates = 1 gtid_domain_id = 1 # 2 on the other node gtid_strict_mode = On auto_increment_increment = 2 auto_increment_offset = 1 # 2 on the other node read_only = On # super_read_only for MySQL 5.7 and newerThen we close the master/master replication ring according to: Starting with empty server.
mariadb> SET GLOBAL gtid_slave_pos = ""; mariadb> CHANGE MASTER TO master_host="192.168.56.101", master_user="replication" , master_use_gtid=current_pos; mariadb> START SLAVE;Installing keepalived
Literature:
The next step is to install and configure keepalived. This can be done as follows:
shell> yum install keepalived shell> systemctl enable keepalivedImportant: In my tests I got crashes and core dumps with keepalived which disappeared after a full upgrade of CentOS 7.
Configuring keepalivedThe most important part is the keepalived configuration file:
# # /etc/keepalived/keepalived.conf # global_defs { notification_email { root@localhost dba@example.com } notification_email_from root@master1 # master2 on the other node smtp_server localhost 25 router_id MARIADB_MM enable_script_security } # Health checks vrrp_script chk_mysql { script "/usr/sbin/pidof mysqld" weight 2 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 3 # script considered failed after ... seconds fall 3 # number of failures for K.O. rise 1 # number of success for OK } vrrp_script chk_failover { script "/etc/keepalived/chk_failover.sh" weight -4 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 1 # script considered failed after ... seconds fall 1 # number of failures for K.O. rise 1 # number of success for OK } # Main configuration vrrp_instance VI_MM_VIP { state MASTER # BACKUP on the other side interface enp0s9 # private heartbeat interface priority 100 # Higher means: elected first (BACKUP: 99) virtual_router_id 42 # ID for all nodes of Cluster group debug 0 # 0 .. 4, seems not to work? unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.99/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql # If File /etc/keepalived/failover is touched failover is triggered # Similar can be reached when priority is lowered followed by a reload chk_failover } # When node becomes MASTER this script is triggered notify_master "/etc/keepalived/keepalived_master.sh --user=root --password= --wait=yes --variable=read_only" # When node becomes SLAVE this script is triggered notify_backup "/etc/keepalived/keepalived_backup.sh --user=root --password= --kill=yes --variable=read_only" # Possibly fault and stop should also call keepalived_backup.sh to be on the safe side... notify_fault "/etc/keepalived/keepalived_fault.sh arg1 arg2" notify_stop "/etc/keepalived/keepalived_stop.sh arg1 arg2" # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit }With the command:
shell> systemctl restart keepalivedthe service is started and/or the configuration is reloaded.
The scripts we used in the configuration file are the following:
chk_failover.sh keepalived_backup.sh keepalived_fault.sh keepalived_master.sh keepalived_notify.sh keepalived_stop.sh#!/bin/bash # # /etc/keepalived/keepalived_notify.sh # TYPE=${1} NAME=${2} STATE=${3} PRIORITY=${4} TS=$(date '+%Y-%m-%d_%H:%M:%S') LOG=/etc/keepalived/keepalived_notify.log echo $TS $0 $@ >>${LOG}
#!/bin/bash # # /etc/keepalived/chk_failover.sh # /usr/bin/stat /etc/keepalived/failover 2>/dev/null 1>&2 if [ ${?} -eq 0 ] ; then exit 1 else exit 0 fi
To make MariaDB master/master replication more robust against replication problems we took the following (configurable) actions on the database side:
Getting the MASTER role:
- Waiting for catch-up replication
- Make the MariaDB instance read/write
Getting the BACKUP role:
- Make the MariaDB instance read-only
- Kill all open connections
Testing scenarios
The following scenarios where tested under load (insert_test.sh):
- Intentional fail-over for maintenance: shell> touch /etc/keepalived/failover shell> rm -f /etc/keepalived/failover
- Stopping keepalived: shell> systemctl stop keepalived shell> systemctl start keepalived
- Stopping MariaDB node: shell> systemctl stop mariadb shell> systemctl start mariadb
- Reboot server: shell> reboot
- Simulation of split-brain: shell> ip link set enp0s9 down shell> ip link set enp0s9 up
Problems
Problems we faced during set-up and testing were:
- SElinux/AppArmor
- Firewall
Keepalived controlling 2 virtual IPs
A second scenario we wanted to build is a MariaDB master/master GTID based replication cluster with 2 VIP addresses. This is to achieve either a read-only VIP and a read/write VIP or to have half of the load on one master and half of the load on the other master:
For this scenario we used the same scripts but a slightly different keepalived configuration:
# # /etc/keepalived/keepalived.conf # global_defs { notification_email { root@localhost dba@example.com } notification_email_from root@master1 # master2 on the other node smtp_server localhost 25 router_id MARIADB_MM enable_script_security } # Health checks vrrp_script chk_mysql { script "/usr/sbin/pidof mysqld" weight 2 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 3 # script considered failed after ... seconds fall 3 # number of failures for K.O. rise 1 # number of success for OK } vrrp_script chk_failover { script "/etc/keepalived/chk_failover.sh" weight -4 # Is relevant for the diff in priority interval 1 # every ... seconds timeout 1 # script considered failed after ... seconds fall 1 # number of failures for K.O. rise 1 # number of success for OK } # Main configuration vrrp_instance VI_MM_VIP1 { state MASTER # BACKUP on the other side interface enp0s9 # private heartbeat interface priority 100 # Higher means: elected first (BACKUP: 99) virtual_router_id 42 # ID for all nodes of Cluster group unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.99/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql chk_failover } # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit } vrrp_instance VI_MM_VIP2 { state BACKUP # MASTER on the other side interface enp0s9 # private heartbeat interface priority 99 # Higher means: elected first (MASTER: 100) virtual_router_id 43 # ID for all nodes of Cluster group unicast_src_ip 192.168.56.101 # Our private IP address unicast_peer { 192.168.56.102 # Peers private IP address } # For keepalived communication authentication { auth_type PASS auth_pass Secr3t! } # VIP to move around virtual_ipaddress { 192.168.1.98/24 dev enp0s8 # public interface for VIP } # Check health of local system. See vrrp_script above. track_script { chk_mysql chk_failover } # ANY state transit is triggered notify /etc/keepalived/keepalived_notify.sh smtp_alert # send notification during state transit }Taxonomy upgrade extras: mariadbmaster-masterkeepalivedVIPreplicationGTID
Galera Load Balancer the underestimated wallflower
There are some pretty sophisticated Load Balancers for Galera Clusters setups out in the market (ProxySQL, MaxScale, HAproxy, ...). They have many different exotic features. You can nearly do everything with them. But this comes at the cost of complexity. Non of them is simple any more.
A widely underestimated Load Balancer solution for Galera Cluster setups is the Galera Load Balancer from Codership. It is an simple Load Balancer solution which serves all of our daily needs when it comes to Galera Cluster. Unfortunately this product is not much promoted by the software vendor himself.
Installation of Galera Load BalancerThis starts with the installation. There are no packages ready to install. You have to compile Galera Load Balancer yourself. FromDual provides some compiled packages or can help you building and installing it.
You can get the Galera Load Balancer sources from Github. The binaries are built straight forward:
shell> git clone https://github.com/codership/glb shell> cd glb/ shell> ./bootstrap.sh shell> ./configure shell> make shell> make installIf you prefer a binary tar ball as I do, you can run the following commands instead of make install:
shell> TARGET=glb-1.0.1-linux-$(uname -m) shell> mkdir -p ${TARGET}/sbin ${TARGET}/lib ${TARGET}/share/glb shell> cp src/glbd ${TARGET}/sbin/ shell> cp src/.libs/libglb.a src/.libs/libglb.so* ${TARGET}/lib/ shell> cp files/* ${TARGET}/share/glb/ shell> cp README NEWS COPYING CONTRIBUTORS.txt CONTRIBUTOR_AGREEMENT.txt ChangeLog BUGS AUTHORS shell> tar czf ${TARGET}.tar.gz ${TARGET} shell> rm -rf ${TARGET}Configuration of Galera Load Balancer
The Galera Load Balancer is configured in a file called glbd which must be located under /etc/sysconfig/gldb (Red Hat and its derivatives) or /etc/default/glbd (Debian and its derivatives). I did not find any option to tell Galera Load Balancer where to search for a configuration file.
The Galera Load Balancer parameters are documented here.
Starting and Stopping Galera Load BalancerThis means for me I have to specify all my parameters on the command line:
product/glb/sbin/glbd --threads 8 --max_conn 500 \ --round --fifo /home/mysql/run/glbd.fifo --control 127.0.0.1:3333 \ 127.0.0.1:3306 \ 192.168.1.1:3306:1 192.168.1.2:3306:2 192.168.1.3:3306:1An equivalent configuration file would look as follows:
# # /etc/sysconfig/glbd.cfg # LISTEN_ADDR="127.0.0.1:3306" CONTROL_ADDR="127.0.0.1:3333" CONTROL_FIFO="/home/mysql/run/glbd.fifo" THREADS="8" MAX_CONN="500" DEFAULT_TARGETS="192.168.1.1:3306:1 192.168.1.2:3306:2 192.168.1.3:3306:1" OTHER_OPTIONS="--round"Stopping Galera Load Balancer is simple: killall glbd
Galera Load Balancer operations
Beside starting and stopping Galera Load Balancer you also want to look into it. This can be done with the following 2 commands:
echo getinfo | nc -q 1 127.0.0.1 3333 echo getstats | nc -q 1 127.0.0.1 3333Or if you want to have it in a more top/vmstat like style:
watch -n 1 "echo getstats | nc -q 1 127.0.0.1 3333" watch -n 1 -d "echo getinfo | nc -q 1 127.0.0.1 3333"More interesting are operations like draining and undraining a Galera Cluster node from the Galera Load Balancer. To drain a Galera Cluster node for example for maintenance (kernel upgrade?) you can run the following command:
echo "192.168.1.2:3306:0" | nc 127.0.0.1 3333To undrain the node again it works like this:
echo "192.168.1.2:3306:2" | nc 127.0.0.1 3333Unfortunately Galera Load Balancer does not memorize the weight (:2).
If you want to remove or add a node from/to the Galera Load Balancer this works as follows:
echo "192.168.1.2:3306:-1" | nc 127.0.0.1 3333 echo "192.168.1.2:3306:1" | nc 127.0.0.1 3333Further Galera Load Balancer operation tasks you can find in the documentation.
Taxonomy upgrade extras: Galera Clusterglbload balancerOperationsGalera Load Balancer the underestimated wallflower
There are some pretty sophisticated Load Balancers for Galera Clusters setups out in the market (ProxySQL, MaxScale, HAproxy, ...). They have many different exotic features. You can nearly do everything with them. But this comes at the cost of complexity. Non of them is simple any more.
A widely underestimated Load Balancer solution for Galera Cluster setups is the Galera Load Balancer from Codership. It is an simple Load Balancer solution which serves all of our daily needs when it comes to Galera Cluster. Unfortunately this product is not much promoted by the software vendor himself.
Installation of Galera Load BalancerThis starts with the installation. There are no packages ready to install. You have to compile Galera Load Balancer yourself. FromDual provides some compiled packages or can help you building and installing it.
You can get the Galera Load Balancer sources from Github. The binaries are built straight forward:
shell> git clone https://github.com/codership/glb shell> cd glb/ shell> ./bootstrap.sh shell> ./configure shell> make shell> make installIf you prefer a binary tar ball as I do, you can run the following commands instead of make install:
shell> TARGET=glb-1.0.1-linux-$(uname -m) shell> mkdir -p ${TARGET}/sbin ${TARGET}/lib ${TARGET}/share/glb shell> cp src/glbd ${TARGET}/sbin/ shell> cp src/.libs/libglb.a src/.libs/libglb.so* ${TARGET}/lib/ shell> cp files/* ${TARGET}/share/glb/ shell> cp README NEWS COPYING CONTRIBUTORS.txt CONTRIBUTOR_AGREEMENT.txt ChangeLog BUGS AUTHORS shell> tar czf ${TARGET}.tar.gz ${TARGET} shell> rm -rf ${TARGET}Configuration of Galera Load Balancer
The Galera Load Balancer is configured in a file called glbd which must be located under /etc/sysconfig/gldb (Red Hat and its derivatives) or /etc/default/glbd (Debian and its derivatives). I did not find any option to tell Galera Load Balancer where to search for a configuration file.
The Galera Load Balancer parameters are documented here.
Starting and Stopping Galera Load BalancerThis means for me I have to specify all my parameters on the command line:
product/glb/sbin/glbd --threads 8 --max_conn 500 \ --round --fifo /home/mysql/run/glbd.fifo --control 127.0.0.1:3333 \ 127.0.0.1:3306 \ 192.168.1.1:3306:1 192.168.1.2:3306:2 192.168.1.3:3306:1An equivalent configuration file would look as follows:
# # /etc/sysconfig/glbd.cfg # LISTEN_ADDR="127.0.0.1:3306" CONTROL_ADDR="127.0.0.1:3333" CONTROL_FIFO="/home/mysql/run/glbd.fifo" THREADS="8" MAX_CONN="500" DEFAULT_TARGETS="192.168.1.1:3306:1 192.168.1.2:3306:2 192.168.1.3:3306:1" OTHER_OPTIONS="--round"Stopping Galera Load Balancer is simple: killall glbd
Galera Load Balancer operations
Beside starting and stopping Galera Load Balancer you also want to look into it. This can be done with the following 2 commands:
echo getinfo | nc -q 1 127.0.0.1 3333 echo getstats | nc -q 1 127.0.0.1 3333Or if you want to have it in a more top/vmstat like style:
watch -n 1 "echo getstats | nc -q 1 127.0.0.1 3333" watch -n 1 -d "echo getinfo | nc -q 1 127.0.0.1 3333"More interesting are operations like draining and undraining a Galera Cluster node from the Galera Load Balancer. To drain a Galera Cluster node for example for maintenance (kernel upgrade?) you can run the following command:
echo "192.168.1.2:3306:0" | nc 127.0.0.1 3333To undrain the node again it works like this:
echo "192.168.1.2:3306:2" | nc 127.0.0.1 3333Unfortunately Galera Load Balancer does not memorize the weight (:2).
If you want to remove or add a node from/to the Galera Load Balancer this works as follows:
echo "192.168.1.2:3306:-1" | nc 127.0.0.1 3333 echo "192.168.1.2:3306:1" | nc 127.0.0.1 3333Further Galera Load Balancer operation tasks you can find in the documentation.
Taxonomy upgrade extras: Galera Clusterglbload balancerOperationsFind evil developer habits with log_queries_not_using_indexes
Recently I switched on the MariaDB slow query logging flag log_queries_not_using_indexes just for curiosity on one of our customers systems:
mariadb> SHOW GLOBAL VARIABLES LIKE 'log_quer%'; +-------------------------------+-------+ | Variable_name | Value | +-------------------------------+-------+ | log_queries_not_using_indexes | OFF | +-------------------------------+-------+ mariadb> SET GLOBAL log_queries_not_using_indexes = ON;A tail -f on the MariaDB Slow Query Log caused a huge flickering on my screen.
I got to see about 5 times per second the following statement sequence in the Slow Query Log:
So at least 5 times 95 ms (5 x (92 + 3) = 475 ms) per 1000 ms (48%) where spent in these 2 statements which are running quite fast but do not use an index (long_query_time was set to 2 seconds).
So I estimate, that this load job can be speed up at least by factor 2 when using the LAST_INSERT_ID() function correctly not considering the possible reduction of network traffic (throughput and response time).
To show the problem I made a little test case:
mariadb> INSERT INTO test VALUES (NULL, 'Some data', NULL); mariadb> SELECT LAST_INSERT_ID() from test; +------------------+ | LAST_INSERT_ID() | +------------------+ | 1376221 | ... | 1376221 | +------------------+ 1048577 rows in set (0.27 sec)The response time of this query will linearly grow with the amount of data as long as they fit into memory and the response time will explode as soon as the table does not fit into memory any more. In addition the network traffic would be reduced by about 8 Mbyte (1 Mio rows x BIGINT UNSIGNED (64-bit) + some header per row?) per second (6-8% of the network bandwidth of a 1 Gbit network link).
shell> ifconfig lo | grep bytes RX bytes:2001930826 (2.0 GB) TX bytes:2001930826 (2.0 GB) shell> ifconfig lo | grep bytes RX bytes:2027289745 (2.0 GB) TX bytes:2027289745 (2.0 GB)The correct way of doing the query would be:
mariadb> SELECT LAST_INSERT_ID(); +------------------+ | last_insert_id() | +------------------+ | 1376221 | +------------------+ 1 row in set (0.00 sec)The response time is below 10 ms.
So why is the first query taking so long an consuming so many resources? To get an answer to this question the MariaDB Optimizer can tell us more with the Query Execution Plan (QEP):
mariadb> EXPLAIN SELECT LAST_INSERT_ID() FROM test; +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | 1 | SIMPLE | test | index | NULL | PRIMARY | 4 | NULL | 1048577 | Using index | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ mariadb> EXPLAIN FORMAT=JSON SELECT LAST_INSERT_ID() FROM test; { "query_block": { "select_id": 1, "table": { "table_name": "test", "access_type": "index", "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "rows": 1048577, "filtered": 100, "using_index": true } } }The database does a Full Index Scan (FIS, other call it a Index Fast Full Scan (IFFS)) on the Primary Key (column id).
The Query Execution Plan of the second query looks as follows:
mariadb> EXPLAIN SELECT LAST_INSERT_ID(); +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ mariadb> EXPLAIN FORMAT=JSON SELECT LAST_INSERT_ID(); { "query_block": { "select_id": 1, "table": { "message": "No tables used" } } }Taxonomy upgrade extras: query tuningOptimizerindexindex scanlast_insert_idexplainslowlog
Find evil developer habits with log_queries_not_using_indexes
Recently I switched on the MariaDB slow query logging flag log_queries_not_using_indexes just for curiosity on one of our customers systems:
mariadb> SHOW GLOBAL VARIABLES LIKE 'log_quer%'; +-------------------------------+-------+ | Variable_name | Value | +-------------------------------+-------+ | log_queries_not_using_indexes | OFF | +-------------------------------+-------+ mariadb> SET GLOBAL log_queries_not_using_indexes = ON;A tail -f on the MariaDB Slow Query Log caused a huge flickering on my screen.
I got to see about 5 times per second the following statement sequence in the Slow Query Log:
So at least 5 times 95 ms (5 x (92 + 3) = 475 ms) per 1000 ms (48%) where spent in these 2 statements which are running quite fast but do not use an index (long_query_time was set to 2 seconds).
So I estimate, that this load job can be speed up at least by factor 2 when using the LAST_INSERT_ID() function correctly not considering the possible reduction of network traffic (throughput and response time).
To show the problem I made a little test case:
mariadb> INSERT INTO test VALUES (NULL, 'Some data', NULL); mariadb> SELECT LAST_INSERT_ID() from test; +------------------+ | LAST_INSERT_ID() | +------------------+ | 1376221 | ... | 1376221 | +------------------+ 1048577 rows in set (0.27 sec)The response time of this query will linearly grow with the amount of data as long as they fit into memory and the response time will explode as soon as the table does not fit into memory any more. In addition the network traffic would be reduced by about 8 Mbyte (1 Mio rows x BIGINT UNSIGNED (64-bit) + some header per row?) per second (6-8% of the network bandwidth of a 1 Gbit network link).
shell> ifconfig lo | grep bytes RX bytes:2001930826 (2.0 GB) TX bytes:2001930826 (2.0 GB) shell> ifconfig lo | grep bytes RX bytes:2027289745 (2.0 GB) TX bytes:2027289745 (2.0 GB)The correct way of doing the query would be:
mariadb> SELECT LAST_INSERT_ID(); +------------------+ | last_insert_id() | +------------------+ | 1376221 | +------------------+ 1 row in set (0.00 sec)The response time is below 10 ms.
So why is the first query taking so long an consuming so many resources? To get an answer to this question the MariaDB Optimizer can tell us more with the Query Execution Plan (QEP):
mariadb> EXPLAIN SELECT LAST_INSERT_ID() FROM test; +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | 1 | SIMPLE | test | index | NULL | PRIMARY | 4 | NULL | 1048577 | Using index | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ mariadb> EXPLAIN FORMAT=JSON SELECT LAST_INSERT_ID() FROM test; { "query_block": { "select_id": 1, "table": { "table_name": "test", "access_type": "index", "key": "PRIMARY", "key_length": "4", "used_key_parts": ["id"], "rows": 1048577, "filtered": 100, "using_index": true } } }The database does a Full Index Scan (FIS, other call it a Index Fast Full Scan (IFFS)) on the Primary Key (column id).
The Query Execution Plan of the second query looks as follows:
mariadb> EXPLAIN SELECT LAST_INSERT_ID(); +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | +------+-------------+-------+------+---------------+------+---------+------+------+----------------+ mariadb> EXPLAIN FORMAT=JSON SELECT LAST_INSERT_ID(); { "query_block": { "select_id": 1, "table": { "message": "No tables used" } } }Taxonomy upgrade extras: query tuningOptimizerindexindex scanlast_insert_idexplainslowlog
Storing BLOBs in the database
We have sometimes discussions with our customers whether to store LOBs (Large Objects) in the database or not. To not rephrase the arguments again and again I have summarized them in the following lines.
The following items are more or less valid for all large data types (BLOB, TEXT and theoretically also for JSON and GIS columns) stored in a MySQL or MariaDB (or any other relational) database.
The idea of a relational table based data-store is to store structured data (numbers, data and short character strings) to have a quick write and read access to them.
And yes, you can also store other things like videos, huge texts (PDF, emails) or similar in a RDBMS but they are principally not designed for such a job and thus non optimal for the task. Software vendors implement such features not mainly because it makes sense but because users want it and the vendors want to attract users (or their managers) with such features (USP, Unique Selling Proposition). Here also one of my Mantras: Use the right tool for the right task:
The main topics to discuss related to LOBs are: Operations, performance, economical reasons and technical limitations.
Disadvantages of storing LOBs in the database- The database will grow fast. Operations will become more costly and complicated.
- Backup and restore will become more costly and complicated for the admin because of the increased size caused by LOBs.
- Backup and restore will take longer because of the same reason.
- Database and table management functions (OPTIMIZE, ALTER, etc.) will take longer on big LOB tables.
- Smaller databases need less RAM/disk space and are thus cheaper.
- Smaller databases fit better into your RAM and are thus potentially faster (RAM vs disk access).
- RDBMS are a relatively slow technology (compared to others). Reading LOBs from the database is significantly slower than reading LOBs from a filer for example.
- LOBs stored in the database will spoil your database cache (InnoDB Buffer Pool) and thus possibly slow down other queries (does not necessarily happen with more sophisticated RBDMS).
- LOB size limitation of 1 Gbyte in reality (max_allowed_packet, theoretically limit is at 4 Gbyte) for MySQL/MariaDB.
- Expensive, fast database store (RAID-10, SSD) is wasted for something which can be stored better on a cheap slow file store (RAID-5, HDD).
- It is programmatically often more complicated to get LOBs from a database than from a filer (depends on your libraries).
Advantages of storing LOBs in the database
- Atomicity between data and LOB is guaranteed by transactions (is it really in MySQL/MariaDB?).
- There are no dangling links (reference from data to LOB) between data and LOB.
- Data and LOB are from the same point in time and can be included in the same backup.
- Data and LOB can be transferred simultaneously to other machines, by database replication or dump/restore.
- Applications can use the same mechanism to get the data and the LOB. Remote access needs no separate file transfer for the LOB.
Conclusion
So basically you have to balance the advantages vs. the disadvantages of storing LOBs in the database and decided what arguments are more important in your case.
If you have some more good arguments pro or contra storing LOBs in the database please let me know.
LiteratureCheck also various articles on Google.
Taxonomy upgrade extras: blobtextlobdesign