Percona XtraBackup is an open source hot backup utility for MySQL-based servers. To take consistent and hot backup, it uses various locking methods, especially for non-transactional storage engine tables. This blog post discusses the cause and possible solution for queries with ​Waiting for table flush state in processlist when taking backups using Percona XtraBackup. Only MySQL 5.7 version is affected by this, as per my tests.

Type of locks taken by Percona XtraBackup

Before discussing the main issue, let’s learn about the type of locks used by Percona XtraBackup to take consistent backups. Percona XtraBackup uses backup locks as a lightweight alternative to FLUSH TABLES WITH READ LOCK. This feature is available in Percona Server for MySQL 5.6+. MySQL 8.0 allows acquiring an instance-level backup lock via the LOCK INSTANCE FOR BACKUP statement.

Locking is only done for MyISAM storage engine tables and other non-InnoDB tables after Percona XtraBackup finishes backing up all InnoDB/XtraDB data and logs. With backup locks, Percona XtraBackup uses LOCK TABLES FOR BACKUP automatically to copy non-InnoDB data and avoid blocking DML queries that modify InnoDB tables.

So for Percona Server for MySQL 5.7 and Percona XtraDB Cluster 5.7, it uses the following locking command while performing MyISAM and other non-InnoDB tables:

For upstream MySQL 5.7 versions, it uses the following locking commands:

MySQL 8.0 uses the LOCK INSTANCE FOR BACKUP command.

Queries with ​Waiting for table flush state in processlist output while XtraBackup is running, this issue is visible in upstream MySQL 5.7 versions only; Percona Server for MySQL/Percona XtraDB Cluster/MySQL 8.0 will not have this issue since they use a different locking mechanism.

Root cause

When MySQL has a long-running query for a table/s, running XtraBackup will run FLUSH NO_WRITE_TO_BINLOG TABLES and all types of new queries on a table where a long-running query is running will have Waiting for table flush state in processlist.

This will be resolved after the long-running query is finished, and the next backup will complete successfully.

Test case example:

#Start sysbench load on the database:

Session 1: Ran the following query:

#Start XtraBackup:

Session 2: while session1 long-query is running:

Session 3: while session1 long-query is running:

As we can see from the example, all new queries on joinit table are in the Waiting for table flush state. Queries on other tables are not affected.

Once the long query is completed, we don’t see the Waiting for table flush state in processlist, and backup will continue.

Possible solutions

 

List of options that you can use with Percona XtraBackup:

  • xtrabackup –ftwrl-wait-timeout (seconds) – how long to wait for a good moment. Default is 0, not to wait.
  • xtrabackup –ftwrl-wait-query-type – which long queries should be finished before FLUSH TABLES WITH READ LOCK is run. Default is ALL.
  • xtrabackup –ftwrl-wait-threshold (seconds) – how long the query should be running before we consider it long running and potential blocker of global lock.
  • xtrabackup –kill-long-queries-timeout (seconds) – how much time we give for queries to complete after FLUSH TABLES WITH READ LOCK is issued before start to kill. Default if 0, not to kill.
  • xtrabackup –kill-long-query-type – which queries should be killed once kill-long-queries-timeout has expired. The default is SELECT.

Note: Killing select queries is OK but for DML/DDL it could lead to data inconsistency) so better to retry backup later/during non-peak hours.

References:

How Percona XtraBackup works (8.0)

How Percona XtraBackup works (2.4)

Improved FLUSH TABLES WITH READ LOCK handling

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments