You are here

Materialized Views with MySQL

Content

What is a Materialized View?

A Materialized View (MV) is the pre-calculated (materialized) result of a query. Unlike a simple VIEW the result of a Materialized View is stored somewhere, generally in a table. Materialized Views are used when immediate response is needed and the query where the Materialized View bases on would take to long to produce a result. Materialized Views have to be refreshed once in a while. It depends on the requirements how often a Materialized View is refreshed and how actual its content is. Basically a Materialized View can be refreshed immediately or deferred, it can be refreshed fully or to a certain point in time. MySQL does not provide Materialized Views by itself. But it is easy to build Materialized Views yourself.

Implement your own Materialized Views

A short example for how this could be done is the following query:

SELECT COUNT(*)
  FROM MyISAM_table;

returns immediate result because the counter is stored in the table header. The following query can take some seconds up to minutes:

SELECT COUNT(*)
 FROM innodb_huge;

A possible solution for this would be to create a table where all InnoDB row counts are stored in

CREATE TABLE innodb_row_count (
    id          INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
  , schema_name VARCHAR(64)  NOT NULL
  , table_name  VARCHAR(64)  NOT NULL
  , row_count   INT UNSIGNED NOT NULL
);

Depending on the needed correctness of this information the table can be refreshed once a day (least used resources on the system but biggest error in result), once a hour or in most extreme case after every change (slowest)!

An other possibility would be to get the data from the information schema. But this information can be up to 20% wrong!

SELECT table_schema, table_name, table_rows
  FROM information_schema.tables
 WHERE table_type = 'BASE TABLE';

Refreshing materialized views

Materialized Views can be refreshed in different kinds. They can be refreshed:

  • never (only once in the beginning, for static data only)
  • on demand (for example once a day, for example after nightly load)
  • immediately (after each statement)

A refresh can be done in the following ways:

  • completely (slow, full from scratch)
  • deferred (fast, by a log table)

By storing the change information in a log table. Also some snapshots or time delayed states can be produced:

  • refresh up to date
  • refresh full

Hands on

To understand all this more in detail it is probably easiest to do some examples. Assume we have a table sales:

CREATE TABLE sales (
    sales_id       INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
  , product_name   VARCHAR(128) NOT NULL
  , product_price  DECIMAL(8,2) NOT NULL
  , product_amount SMALLINT     NOT NULL
);

INSERT INTO sales VALUES
  (NULL, 'Apple', 1.25, 1), (NULL, 'Apple', 2.40, 2)
, (NULL, 'Apple', 4.05, 3), (NULL, 'Pear', 6.30, 2)
, (NULL, 'Pear', 12.20, 4), (NULL, 'Plum', 4.85, 3)
;

SELECT * FROM sales;

And now we want to know the price sold and the money earned per product:

EXPLAIN
SELECT product_name
     , SUM(product_price) AS price_sum, SUM(product_amount) AS amount_sum
     , AVG(product_price) AS price_avg, AVG(product_amount) amount_agg
     , COUNT(*)
  FROM sales
 GROUP BY product_name
 ORDER BY price_sum;

+-------------+-------+------+---------------+------+---------------------------------+
| select_type | table | type | possible_keys | rows | Extra                           |
+-------------+-------+------+---------------+------+---------------------------------+
| SIMPLE      | sales | ALL  | NULL          |    6 | Using temporary; Using filesort |
+-------------+-------+------+---------------+------+---------------------------------+

On such a small table it is pretty fast but when you have hundreds of products and millions of sales transactions it can take minutes to hours!

Create your own Materialized View:

DROP TABLE sales_mv;
CREATE TABLE sales_mv (
    product_name VARCHAR(128)  NOT NULL
  , price_sum    DECIMAL(10,2) NOT NULL
  , amount_sum   INT           NOT NULL
  , price_avg    FLOAT         NOT NULL
  , amount_avg   FLOAT         NOT NULL
  , sales_cnt    INT           NOT NULL
  , UNIQUE INDEX product (product_name)
);

INSERT INTO sales_mv
SELECT product_name
    , SUM(product_price), SUM(product_amount)
    , AVG(product_price), AVG(product_amount)
    , COUNT(*)
  FROM sales
GROUP BY product_name;

This is up to now the easiest part! And, as expected we get the correct result:

mysql> SELECT * FROM sales_mv;
+--------------+-----------+------------+-----------+------------+-----------+
| product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt |
+--------------+-----------+------------+-----------+------------+-----------+
| Apple        |      7.70 |          6 |   2.56667 |          2 |         3 |
| Pear         |     18.50 |          6 |      9.25 |          3 |         2 |
| Plum         |      4.85 |          3 |      4.85 |          3 |         1 |
+--------------+-----------+------------+-----------+------------+-----------+
3 rows in set (0.00 sec)

This would cover the refreshment mode "NEVER" But this is not what we generally want to do.

Refresh Materialized View on demand

Refreshing the Materialized View on demand can be implemented with a Stored Procedure as follows:

DROP PROCEDURE refresh_mv_now;

DELIMITER $$

CREATE PROCEDURE refresh_mv_now (
    OUT rc INT
)
BEGIN

  TRUNCATE TABLE sales_mv;

  INSERT INTO sales_mv
  SELECT product_name
      , SUM(product_price), SUM(product_amount)
      , AVG(product_price), AVG(product_amount)
      , COUNT(*)
    FROM sales
  GROUP BY product_name;

  SET rc = 0;
END;
$$

DELIMITER ;

To check if it works the following statement were used:

CALL refresh_mv_now(@rc);

SELECT * FROM sales_mv;

+--------------+-----------+------------+-----------+------------+-----------+
| product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt |
+--------------+-----------+------------+-----------+------------+-----------+
| Apple        |      7.70 |          6 |   2.56667 |          2 |         3 |
| Pear         |     18.50 |          6 |      9.25 |          3 |         2 |
| Plum         |      4.85 |          3 |      4.85 |          3 |         1 |
+--------------+-----------+------------+-----------+------------+-----------+

INSERT INTO sales VALUES
  (NULL, 'Apple', 2.25, 3), (NULL, 'Plum', 3.35, 1)
, (NULL, 'Pear', 1.80, 2);

CALL refresh_mv_now(@rc);

SELECT * FROM sales_mv;

+--------------+-----------+------------+-----------+------------+-----------+
| product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt |
+--------------+-----------+------------+-----------+------------+-----------+
| Apple        |      9.95 |          9 |    2.4875 |       2.25 |         4 |
| Pear         |     20.30 |          8 |   6.76667 |    2.66667 |         3 |
| Plum         |      8.20 |          4 |       4.1 |          2 |         2 |
+--------------+-----------+------------+-----------+------------+-----------+

To make the output a little nicer we can add a VIEW on the Materialized View table as follows:

CREATE VIEW sales_v AS
SELECT product_name, price_sum, amount_sum, price_avg, amount_avg
  FROM sales_mv;

Refresh Materialized View immediate

To do a full refresh after each statement does not make sense. But we still would like to have to proper result. To do this it is a little bit more complicated.

On every INSERT on the sales table we have to update our Materialized View. We can implement this transparently by INSERT/UPDATE/DELETE triggers on the sales table:

Now let us create the needed triggers:

DELIMITER $$

CREATE TRIGGER sales_ins
AFTER INSERT ON sales
FOR EACH ROW
BEGIN

  SET @old_price_sum = 0;
  SET @old_amount_sum = 0;
  SET @old_price_avg = 0;
  SET @old_amount_avg = 0;
  SET @old_sales_cnt = 0;

  SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0)
       , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0)
    FROM sales_mv
   WHERE product_name = NEW.product_name
    INTO @old_price_sum, @old_amount_sum, @old_price_avg
       , @old_amount_avg, @old_sales_cnt
  ;

  SET @new_price_sum = @old_price_sum + NEW.product_price;
  SET @new_amount_sum = @old_amount_sum + NEW.product_amount;
  SET @new_sales_cnt = @old_sales_cnt + 1;
  SET @new_price_avg = @new_price_sum / @new_sales_cnt;
  SET @new_amount_avg = @new_amount_sum / @new_sales_cnt;

  REPLACE INTO sales_mv
  VALUES(NEW.product_name, @new_price_sum, @new_amount_sum, @new_price_avg
       , @new_amount_avg, @new_sales_cnt)
  ;

END;
$$

DELIMITER ;

DELIMITER $$

CREATE TRIGGER sales_del
AFTER DELETE ON sales
FOR EACH ROW
BEGIN

  SET @old_price_sum = 0;
  SET @old_amount_sum = 0;
  SET @old_price_avg = 0;
  SET @old_amount_avg = 0;
  SET @old_sales_cnt = 0;

  SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0)
       , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0)
    FROM sales_mv
   WHERE product_name = OLD.product_name
    INTO @old_price_sum, @old_amount_sum, @old_price_avg
       , @old_amount_avg, @old_sales_cnt
  ;

  SET @new_price_sum = @old_price_sum - OLD.product_price;
  SET @new_amount_sum = @old_amount_sum - OLD.product_amount;
  SET @new_price_avg = @new_price_sum / @new_amount_sum;
  SET @new_sales_cnt = @old_sales_cnt - 1;
  SET @new_amount_avg = @new_amount_sum / @new_sales_cnt;

  REPLACE INTO sales_mv
  VALUES(OLD.product_name, @new_price_sum, @new_amount_sum
       , IFNULL(@new_price_avg, 0), IFNULL(@new_amount_avg, 0)
       , @new_sales_cnt)
  ;

END;
$$

DELIMITER ;

DELIMITER $$

CREATE TRIGGER sales_upd
AFTER UPDATE ON sales
FOR EACH ROW
BEGIN

  SET @old_price_sum = 0;
  SET @old_amount_sum = 0;
  SET @old_price_avg = 0;
  SET @old_amount_avg = 0;
  SET @old_sales_cnt = 0;

  SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0)
       , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0)
    FROM sales_mv
   WHERE product_name = OLD.product_name
    INTO @old_price_sum, @old_amount_sum, @old_price_avg
       , @old_amount_avg, @old_sales_cnt
  ;

  SET @new_price_sum = @old_price_sum + (NEW.product_price - OLD.product_price);
  SET @new_amount_sum = @old_amount_sum + (NEW.product_amount
                      - OLD.product_amount);
  SET @new_sales_cnt = @old_sales_cnt;
  SET @new_price_avg = @new_price_sum / @new_sales_count;
  SET @new_amount_avg = @new_amount_sum / @new_sales_cnt;

  REPLACE INTO sales_mv
  VALUES(OLD.product_name, @new_price_sum, @new_amount_sum
       , IFNULL(@new_price_avg, 0), IFNULL(@new_amount_avg, 0)
       , @new_sales_cnt)
  ;

END;
$$

DELIMITER ;

And now let us see what the result is:

INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1);
INSERT INTO sales VALUES (NULL, 'Apple', 2.40, 2);
INSERT INTO sales VALUES (NULL, 'Apple', 4.05, 3);
INSERT INTO sales VALUES (NULL, 'Pear', 6.30, 2);
INSERT INTO sales VALUES (NULL, 'Pear', 12.20, 4);
INSERT INTO sales VALUES (NULL, 'Plum', 4.85, 3);

DELETE FROM sales WHERE sales_id = 5;
DELETE FROM sales WHERE sales_id = 4;

UPDATE sales SET product_amount = 3 where sales_id = 2;

SELECT * from sales_v;

Materialized Views with snapshotting functionality

The difference to the example above is, that the changes will not be applied immediately. The changes will be stored in a log table and the Materialized View is refreshed after a certain time period for a certain amount of time.

In addition to the example above we need an additional log table:

CREATE TABLE sales_mvl (
    product_name   VARCHAR(128) NOT NULL
  , product_price  DECIMAL(8,2) NOT NULL
  , product_amount SMALLINT     NOT NULL
  , sales_id       INT UNSIGNED NOT NULL
  , product_ts     TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP()
);

Instead of updating the mv the log is filled:

DROP TRIGGER sales_ins;

DELIMITER $$

CREATE TRIGGER sales_ins
AFTER INSERT ON sales
FOR EACH ROW
BEGIN

  INSERT INTO sales_mvl
  VALUES (NEW.product_name, NEW.product_price, NEW.product_amount
        , NEW.sales_id, NULL);

END;
$$

DELIMITER ;

DROP TRIGGER sales_del;

DELIMITER $$

CREATE TRIGGER sales_del
AFTER DELETE ON sales
FOR EACH ROW
BEGIN

  DELETE FROM sales_mvl
  WHERE sales_id = OLD.sales_id;

END;
$$

DELIMITER ;

DROP TRIGGER sales_upd;

DELIMITER $$

CREATE TRIGGER sales_upd
AFTER UPDATE ON sales
FOR EACH ROW
BEGIN

  UPDATE sales_mvl
     SET product_name = NEW.product_name
       , product_price = NEW.product_price
       , product_amount = NEW.product_amount
       , sales_id = NEW.sales_id
       , product_ts = CURRENT_TIMESTAMP()
  WHERE sales_id = OLD.sales_id;

END;
$$

DELIMITER ;

And a Stored Procedure for refreshing the Materialized View is built. Possible modes are:

  • REFRESH FULL (up to now)
  • REFRESH (up to a certain time stamp)
  • REBUILD (fully rebuild and clean MV log table)
DELIMITER $$

CREATE PROCEDURE refresh_mv (
    IN method VARCHAR(16)
  , IN ts TIMESTAMP
  , OUT rc INT
)
BEGIN

IF UPPER(method) = 'REBUILD' THEN

  TRUNCATE TABLE sales_mvl;
  TRUNCATE TABLE sales_mv;

  INSERT INTO sales_mv
  SELECT product_name
      , SUM(product_price), SUM(product_amount), AVG(product_price)
      , AVG(product_amount), COUNT(*)
    FROM sales
  GROUP BY product_name
  ;

ELSEIF UPPER(method) = 'REFRESH FULL' THEN

  REPLACE INTO sales_mv
  SELECT product_name, SUM(price_sum), SUM(amount_sum)
       , SUM(price_sum)/SUM(sales_cnt), SUM(amount_sum)/SUM(sales_cnt)
       , SUM(sales_cnt)
    FROM (
          SELECT product_name, price_sum,  amount_sum, sales_cnt
            FROM sales_mv
          UNION ALL
          SELECT product_name
              , SUM(product_price), SUM(product_amount), COUNT(*)
            FROM sales_mvl
          GROUP BY product_name
        ) x
  GROUP BY product_name
  ;

  TRUNCATE TABLE sales_mvl;

  SET rc = 0;
ELSEIF UPPER(method) = 'REFRESH' THEN

  REPLACE INTO sales_mv
  SELECT product_name, SUM(price_sum), SUM(amount_sum)
       , SUM(price_sum)/SUM(sales_cnt), SUM(amount_sum)/SUM(sales_cnt)
       , SUM(sales_cnt)
    FROM (
          SELECT product_name, price_sum,  amount_sum, sales_cnt
            FROM sales_mv
          UNION ALL
          SELECT product_name
              , SUM(product_price), SUM(product_amount), COUNT(*)
            FROM sales_mvl
          WHERE product_ts < ts
          GROUP BY product_name
        ) x
  GROUP BY product_name
  ;

  DELETE
    FROM sales_mvl
   WHERE product_ts < ts
  ;

  SET rc = 0;
ELSE
  SET rc = 1;
END IF;

END;
$$

DELIMITER ;

And now let us test if it works correctly...

INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1);
wait some time

INSERT INTO sales VALUES (NULL, 'Apple', 2.40, 2);
wait some time

INSERT INTO sales VALUES (NULL, 'Apple', 4.05, 3);
wait some time

INSERT INTO sales VALUES (NULL, 'Pear', 6.30, 2);
wait some time

INSERT INTO sales VALUES (NULL, 'Pear', 12.20, 4);
wait some time

INSERT INTO sales VALUES (NULL, 'Plum', 4.85, 3);

SELECT * from sales_mvl;
SELECT * from sales_v;

CALL refresh_mv('REFRESH', '2006-11-06 10:57:55', @rc);
SELECT * from sales_v;

CALL refresh_mv('REFRESH FULL', NULL, @rc);
SELECT * from sales_v;

CALL refresh_mv('REBUILD', NULL, @rc);
SELECT * from sales_v;

Some performance benchmarks for our Materialized Views:

For the performance benchmark a set of 100 k sales rows was created. This set of rows should simulate a continuous INSERT flow into our database. The whole test was done when all files and all database table were cache in memory. To avoid side effects during measurement no other activities should happen on this machine.

Load into table sales without any triggers as baseline:

Test time [s] Ref.
LOAD DATA INFILE 0.90 [1]
Multi row INSERT 2.85 [2]
Single row INSERT 13.2 [3]
Single row INSERT without LOCK TABLE 15.9 [4]
FULL REFRESH of sales_mv 0.64 [5]
SELECT on MV :-) 0.00 [6]

Load into table sales with some Materialized View functionality:

Test time [s] Ref.
LOAD DATA INFILE
with REFRESH IMMEDIATE
40.8 [1]
Single row INSERT without LOCK TABLE
with REFRESH IMMEDIATE
109 [4]
Single rows INSERT without LOCK TABLE
with REFRESH DEFERRED
22.8 [4]
Refresh MV with about 40% of the data 0.82
Refresh MV with next 40% of the data 0.98
Refresh MV with last 20% of the data 0.14

Outlook

  • If you have a continuous data flow and/or concurrent transactions it may be better to use InnoDB instead of MyISAM tables.
  • Locking the table in the Triggers/Stored Procedure may prevent wrong data this has to be tested.
  • Delayed INSERT may help to speed up the load process.
  • Eventually it makes sense to build/refresh the Materialized Vied in parallel?

Conclusion

  • Triggers in MySQL (5.0.27) are not terribly fast.
  • Materialized Views can help to speed up queries which rely heavily on some aggregated results.
  • If INSERT speed is not a matter this feature can help to decrease read load on the system.
  • It is a little tricky to implement.
  • MV come only in play if huge amount of data are used and database are not memory based anymore.

Literature

[ 1 ]
LOAD DATA
INFILE '/tmp/sales_outfile_100k.dmp'
INTO TABLE sales;

[ 2 ]
time mysql -u root test << EOF;
TRUNCATE TABLE sales;
SOURCE /tmp/sales_multirowinsert_100k.sql
EOF

[ 3 ]
time mysql -u root test << EOF;
TRUNCATE TABLE sales;
SOURCE tmp/sales_extended_100k.sql
EOF

[ 4 ]
time mysql -u root test << EOF;
TRUNCATE TABLE sales;
SOURCE /tmp/sales_extended_nolock_100k.sql
EOF

[ 5 ]
INSERT INTO sales_mv
SELECT product_name
    , SUM(product_price), SUM(product_amount)
    , AVG(product_price), AVG(product_amount)
    , COUNT(*)
  FROM sales
GROUP BY product_name;

[ 6 ]
SELECT *
  FROM sales_mv;

[ 7 ] Wikipedia: Materialized Views