What is the quickest way to load data into the database?
We had some really exciting problems to solve for the last customer! Especially because the database wasn’t exactly small.
Here are some key data: CPU: 2 sockets x 24 cores x 2 threads = 96 vCores, 756 G RAM, 2 x 10 Tbyte PCIe SSD in RAID-10 and 7 Tbyte data, several thousand clients, rapidly growing.
The current throughput: 1 M SELECT/min, 56 k INSERT/min, 44 k UPDATE/min, 7 k DELETE/min averaged over 30 days. With a strong upward trend. Application and queries not consistently optimised. Database configuration: ‘state of the art’ not verified with benchmarks. CPU utilisation approx. 50% on average, more at peak times. I/O system still has available resources.
The customer collects position and other device data and stores it in the database. In other words, a classic IoT problem (with time series, index clustered table, etc.).
The question he has asked is: What is the fastest way to copy data from one table (pending data, a kind of queue) to another table (final data, per client)?
The data flow looks something like this:
+------------+
| IoT Device |--+
+------------+ \
\ +-----+
+------------+ \ | AS | +--------------+ Processing +------------+
| IoT Device |------+-->| |-->| Pending data |------------->| Final data |
+------------+ / | 400 | +--------------+ of data +------------+
/ +-----+
+------------+ /
| IoT Device |--+
+------------+
3 different variants to copy the data were available for selection.
Variant 1: INSERT and DELETE (simplest form)
The simplest variant is a simple INSERT and DELETE. This variant is particularly problematic because MariaDB/MySQL and PostgreSQL have AUTOCOMMIT enabled by default (here, here, here and here).
To help you visualise this a little better, here is some pseudocode:
// 20k rows
for (i = 1; i <= 2000; i++) {
SELECT * FROM pending LIMIT 10;
foreach ( row ) {
INSERT INTO final;
-- implicit COMMIT
DELETE FROM pending WHERE id = row[id];
-- implicit COMMIT
}
}
So if we want to copy 20 k rows, this variant causes: 40 k COMMITs (fsync) and 42 k network round trips!
Variant 2: START TRANSACTION and INSERT ad DELETE
This variant is used by more experienced database developers. Here is the corresponding pseudocode:
// 20k rows
for (i = 1; i <= 2000; i++) {
SELECT * FROM pending LIMIT 10;
START TRANSACTION;
foreach ( row ) {
INSERT INTO final;
DELETE FROM pending WHERE id = row[id];
}
COMMIT;
}
If we want to copy 20 k rows in this example, this variant only causes 2 k COMMITs (fsync)! So 20 times less! But 46 k network round trips (10% more).
Variant 3: START TRANSACTION and optimised INSERT and DELETE
This variant is a little more demanding in terms of programming. It is used if you want to get a little closer to the limits of what is possible. Here is the pseudo code:
// 20k rows
for (i = 1; i <= 2000; i++) {
SELECT * FROM pending LIMIT 10;
START TRANSACTION;
INSERT INTO final (), (), (), (), (), (), (), (), (), ();
DELETE FROM pending WHERE id = IN (...);
COMMIT;
}
And this 3rd variant also only causes 2 k COMMIT’s (fsync) with 20 k rows, but saves the loop via the INSERT and DELETE statements in the database. So on the one hand we save network round trips (only 10 k) and CPU cycles on the database (which are difficult to scale) for parsing the queries.
Test set-up
To test the whole thing, we have prepared a small script: load_data.php
Preparation and execution with MariaDB/MySQL
You can execute these tests yourself with the following commands:
SQL> CREATE DATABASE test;
SQL> CREATE USER 'app'@'127.0.0.1' IDENTIFIED BY 'secret';
SQL> GRANT ALL ON *.* TO 'app'@'127.0.0.1';
$ ./load_data.php --database-type=mysql --database=test --host=127.0.0.1 --port=3306 --user=app --password=secret --prepare
$ for i in $(seq 5) ; do
./load_data.php --database-type=mysql --database=test --host=127.0.0.1 --port=3306 --user=app --password=secret --run --variant=1
./load_data.php --database-type=mysql --database=test --host=127.0.0.1 --port=3306 --user=app --password=secret --run --variant=2
./load_data.php --database-type=mysql --database=test --host=127.0.0.1 --port=3306 --user=app --password=secret --run --variant=3
done
$ ./load_data.php --database-type=mysql --database=test --host=127.0.0.1 --port=3306 --user=app --password=secret --clean-up
Everyone can work out the measured values themselves with the corresponding test script.
Preparation and execution with PostgreSQL
You can execute these tests yourself with the following commands:
postgres# CREATE DATABASE test;
postgres# CREATE USER app PASSWORD 'secret';
postgres# GRANT ALL ON DATABASE test TO app;
postgres# GRANT ALL ON SCHEMA public TO app;
$ ./load_data.php --database-type=postgresql --database=test --host=127.0.0.1 --port=5432 --user=app --password=secret --prepare
$ for i in $(seq 5) ; do
./load_data.php --database-type=postgresql --database=test --host=127.0.0.1 --port=5432 --user=app --password=secret --run --variant=1
./load_data.php --database-type=postgresql --database=test --host=127.0.0.1 --port=5432 --user=app --password=secret --run --variant=2
./load_data.php --database-type=postgresql --database=test --host=127.0.0.1 --port=5432 --user=app --password=secret --run --variant=3
done
$ ./load_data.php --database-type=postgresql --database=test --host=127.0.0.1 --port=5432 --user=app --password=secret --clean-up
Anyone can work out the measured values themselves with the corresponding test script.
Results
To avoid unnecessary discussions, we have ‘only’ listed the relative performance (runtime) here, as MarkC has been doing recently. We are happy to provide our measured values bilaterally. However, they can be easily reproduced with the test script itself.
Less is better:
| Variant 1 | Variant 2 | Variant 3 | |
|---|---|---|---|
| MariaDB 11.8, avg(5) | 100.0% | 7.5% | 6.2% |
| PostgreSQL 19dev, avg(5) | 100.0% | 11.2% | 7.0% |
Attention: The values of MariaDB/MySQL and PostgreSQL can NOT be compared directly!
And here is the graphical evaluation:


Remarks
With faster discs, the difference between 1 and 2/3 would probably not have been quite so significant. Customer tests have shown a difference of ‘only’ about a factor of 5 (instead of a factor of 9 to 16).
There are certainly other ways in which this loading process can be optimised. Here are a few that come to mind:
INSERT INTO ... SELECT * FROMLOAD DATA INFILE/COPY, if possible- Prepared statements
- Server side Stored Language (SQL/PSM, PL/pgSQL, …) :-(
- PDO-Fetch of the results?
- etc.
Maybe I should look for a profiler (Xdebug or xhprof)?
Further contributions
- Load CSV files into the database
- MariaDB Prepared Statements, Transactions and Multi-Row Inserts
- How good is MySQL INSERT TRIGGER performance
This page was translated using deepl.com.

