You are here

MariaDB and MySQL Database Consolidation

We see at various customers the request for consolidating their MariaDB and MySQL infrastructure. The advantage of such a measure is clear in the first step: Saving costs! And this request comes typically from managers. But what we unfortunately see rarely is to question this request from the IT engineering perspective. Because it comes, as anything in life, with some "costs". So, saving costs with consolidation on one side comes with "costs" for operation complexity on the other side.

To give you some arguments for arguing with managers we collected some topics to consider before consolidating:

  • Bigger Database Instances are more demanding in handling than smaller ones:
    • Backup and Restore time takes longer. Copying files around takes longer, etc.
    • Possibly your logical backup with mysqldump does not restore any longer in a reasonable amount of time (Mean Time to Repair/Recover (MTTR) is not met any more). You have to think about some physical backup methods including MariaDB or MySQL Enterprise Backup solutions.
    • Consolidated database instances typically contain many different schemas of various different applications. In case of problems you typically want to restore and possibly recover only one single schema and not all schemas. And this becomes much more complicated (depending on your backup strategy). MariaDB/MySQL tooling is not yet (fully) prepared for this situation (#17365). Possibly your old backup strategy is not adequate any more?
    • Binary Logs are written globally, not per schema. Have you considered how to do a PiTR for one or several schemas on your consolidated instance? Not an easy game.
    • When you restore a schema you do not want the application interfering with your restore. How can you properly exclude the one application from your database instance while you are restoring? Locking accounts (possible only with MariaDB 10.4 and MySQL 5.7 and newer). Tricks like --skip-networking, adding Firewall rules, --read-only, database port change (--port=3307), do not work any more (as easy)!
    • In short the costs are: Restore/Recovery Operations become more demanding!
  • Do NOT mix schemas of different criticalities into the same database instance! The worst cases we have seen were some development schemas which were on the same high-availability Cluster like highly critical transactional systems. The developers did some nasty things on their development systems (which IMHO is OK for them on a development system). What nobody considered in this case was that the troubles from the development schema brought down the whole production schema which was located on the same machine... Cost: Risk of failure of your important services caused by some non-important services AND planning becomes more expensive and you need to know more about all instances and other instances.
  • This phenomena is also called Noisy Neighbor effect. Noisy Neighbors become a bigger issue with consolidated systems. You have to know much more in detail what you and everybody else is doing on the system! Do you...? Costs are: More know-how is required, better education and training of people, more clever people, better planning, better monitoring, etc.
  • When you consolidate different applications into one system it becomes more critical than the previous ones on their own. So you have to think about High-Availability solutions. Costs are: 1 to 4 new instances (for HA), more complexity, more know-how, more technologies... Do you plan to buy an Enterprise Support subscription?
  • Do NOT mix different maintenance windows (Asia vs. Europe vs. America) or daily online-business and nightly job processing. You get shorter maintenance windows. Costs are: Better planning is needed, costly night and weekend maintenance time, etc...

    Europe12:00
    China19:00(7 hours ahead of us)
    US east07:00(5 hours behind us)
    US west04:00(8 hours behind us)
  • Resource Fencing becomes more tricky. Within the same instance resource fencing becomes more tricky and is not really doable atm. MySQL 8.0 shows some firsts steps with the Resource Groups but this is pretty complicated and is by far not complete and usable yet. A better way would be to install several instances on the same machine an fence them with some O/S means like Control Groups. This comes at the costs of know-how, complexity and more complicated set-ups.
  • Naming conflicts can happen: Application a) is called `wiki` and application b) is called `wiki` as well and for some reasons you cannot rename them (any more).
  • Monitoring becomes much more demanding and needs to be done more fine grained. You want to know exactly what is going on your system because it can easily have some side effects on many different schemas/applications. Example of today: We were running out of kernel file descriptors (file-max) and we did not recognize it in the beginning.
  • Consolidated things are a much a higher Bulk Risk (this is true also for SAN or Virtualisation Clusters). When you have an outage not only one application is down but the whole company is down. We have seen this already for SAN and Virtualisation Clusters and we expect to see that soon also on highly consolidated Database Clusters. Costs: Damage on the company is bigger for one incident.
  • Different applications have different configuration requirements which possibly conflict with other requirements from other applications (Jira from Atlassian is a good example for this).
    Server variables cannot be adjusted any more according to somebody’s individual wishes...
    • sql_mode: Some old legacy applications still require ONLY_FULL_GROUP_BY) :-(
    • The requirements are conflicting: Performance/fast vs. Safe/durability: innodb_flush_log_at_trx_commit, sync_binlog, crash-safe binary logging, etc.
    • Transaction isolation: transaction_isolation = READ-COMMITTED (old: tx_isolation, Jira again as an example) vs. REPEATABLE-READ (default). Other applications which do not assume, that transaction isolation behaviour changes. And cannot cope with it. Have you ever asked your developers if their application can cope with a different transaction isolation levels? :-) Do they know what you are talking about?
    • Character set (utf8_bin for Jira as example again), which can be changed globally or on a schema level, but it has to be done correctly for all participants.
  • Some applications require MariaDB some application require MySQL. They are not the same databases any more nowadays (8.0 vs. 10.3/10.4). So you cannot consolidate them (easily).
  • You possibly get a mixture of persistent connections (typically Java with connection pooling) and non-persistent connections (typically PHP and other languages). Which causes different database behaviour, which has an impact on how you configure the database instance. Which is more demanding and needs more knowledge of the database AND the application or you solve it with more RAM.
  • You need to know much more about you application to understand what it does and how could it interfere with others...
  • When you consolidate more and more schemas into your consolidated database server you have to adjust your database setting as well from time to time (innodb_buffer_pool_size, table_open_cache, table_definition_cache, O/S File descriptors, etc). And possibly add more RAM, CPU and stronger I/O. When is your network saturated? Have you thought about this already?
  • Upgrading MariaDB/MySQL and changes in database configuration becomes more demanding in communication and coordination. Potentially several development teams are affected. And they possibly have even different requirements/needs in O/S, forks and database versions. Or are even not willing or capable to update.
  • If you have different schemas on the same Instance it is easier to access data in different schemas at the same time in the same query. This can cause (unwanted) dependencies between those schemas. The database becomes the interface between applications. Here you have to be very restrictive with user privileges to avoid these dependencies. From an architecture point of view it would be more preferable to use clearly defined interfaces outside of the database. For example APIs. But those APIs require much more development resources than a simple SQL query. The problem comes later: If you want to separate the schemas again into different instances the effort is increasing significantly to split/rewrite the JOIN queries and the underlying data sets. Or the depending schemas must be moved all together which causes longer downtimes for applications and requires more coordination between teams.

This leads us to the result that consolidation let us save some costs on infrastructure but adds additional costs on complexity, skills etc. Theses costs will grow exponentially and thus at some point it is not worth the effort any more. This will end up in not only one big consolidated instance but possibly in a hand full of them.

Where this point is for you you have to find yourself...

Alternatives to consolidating everything into one instance

  • 1 Machine can contain 1 to many Database Instances can contain 1 to many Schemas. Instead of putting all schemas into one machine, think about installing several instances on one machine. This comes at the cost of more complexity. MyEnv will help you to manage this additional complexity.
  • 1 Machine can contain 1 to many Virtual Machines (VMs, kvm, XEN, VMWare, etc.) can contain 1 to many Instance(s) can contain 1 to many Schemas. This comes at the cost of even more complexity and pretty complex technology (Virtualization).

A big thanks to Antoniya K. for here valuable feedback!

Taxonomy upgrade extras: 

Comments

We currently have the situation that a customer cannot upgrade its hyper-consolidated Galera Cluster for more than a year because various different applications cannot, are not willing to or do not have the time to test against the newer version.

What what we suggest here: Create a new Galera Cluster with the new version and move over one project/application after the other to the new Cluster. This reduces complexity. You can do slightly progress and the pressure for the "slow-upgrader" increases over time.

Shinguzcomment