In this blog post, we will describe the improvements to Percona XtraBackup 8.0.33-28 (PXB), which significantly reduces the time to prepare the backups before the restore operation. This improvement in Percona XtraBackup significantly reduces the time required for a new node to join the Percona XtraDB Cluster (PXC).

Percona XtraDB Cluster uses Percona XtraBackup to do SST (State Snapshot Transfer) from one node to another. When a new node joins the cluster, SST is performed to receive the data from DONOR to the JOINER. JOINER uses PXB to stream the data directory from DONOR. JOINER must prepare the backup before using it. It is observed that when the DONOR has a huge number of tablespaces (one million),  XtraBackup on JOINER side couldn’t complete the preparing the data (xtrabackup –prepare).

Percona XtraBackup copies the InnoDB data files. The data is internally inconsistent because the server concurrently modifies the data files while they are being copied. Percona XtraBackup performs crash recovery on the files to make a consistent, usable database again. This is called the ‘prepare’ operation (xtrabackup –prepare).

The XtraBackup –prepare operation is done in two phases:

  1. Redo log apply
  2. Undo log apply

In the redo apply phase, the changes from redo log file modifications are applied to a page. This phase has no concept of a row or a transaction. The redo apply phase wouldn’t make the database consistent with respect to a transaction. The changes done by an uncommitted transaction can be flushed or written to the redo log by the server. XtraBackup still applies the modifications recorded in the redo log, and the redo log apply phase does not undo those changes. For that, we have to use undo logs.

In the undo log apply phase (AKA rollback phase), the changes required to undo the transaction are read from undo log pages. They are then applied (for example, writing old values back again) to the pages again and written to disk. After this phase, all uncommitted transactions during the backup are rolled back.

Undo log records are of two types: INSERT Undo log record and UPDATE undo log record. DELETE MARK of records is considered as a subtype of the UPDATE UNDO log record.

The format is as shown below:

Undo log records format

When the Server writes these records, it doesn’t write the index/table information along with each record. It just writes a “table_id” as part of UNDO LOG records. The table_id is used to fetch the table schema. The table schema and key fields from the undo log record are used to create an index search tuple (key). This search tuple (key) is used to find the record to perform the undo operation.

So, given a table_id, how do you get the table schema/definition?

After the  “data dictionary” (DD) engine and DD cache are initialized on a server, the Storage Engines can ask for a table definition. For example, InnoDB asks for a table definition based on the table_id, also known as “se_private_id”.

Percona XtraBackup, unlike a server, doesn’t have access to the “data dictionary” (DD). Initializing a DD engine and the cache adds complexity and other server dependencies. XtraBackup does not simply behave like a server to access a table object.

Discover why Percona XtraBackup is trusted by thousands of enterprises.

Percona XtraBackup initializes the InnoDB engine and requires “InnoDB table object” aka dict_table_t, for all its purposes (rollback, export, etc.). XtraBackup relies on Serialized Dictionary Information (SDI). This is a JSON representation of the table. For InnoDB tablespaces, the information is stored within the tablespace. From 8.0, the IBD file is “self-describing”; for example, the table schema is available within an IBD file.

table schema is available within an IBD file

Let’s take a look at the example table.

The CREATE TABLE statement creates a file called t1.ibd in the test directory. For example, mysql datadir/test/t1.ibd. So t1.ibd contains information about the table structure (columns, their types, how many indexes, columns in indexes, foreign keys, etc.) as SDI. Use a tool called “ibd2sdi” to extract SDI from an IBD file.

As you can see from the above image, the table name is in the “dd_object:name” field, and the column information is stored in a “dd_object:columns” array.

Old design (until Percona XtraBackup 8.0.33-27):

XtraBackup reads SDI from *every* IBD and loads all tables from each IBD into the cache as non-evictable.  Essentially LRU cache is disabled by loading the tables as non-evictable. Every table remains in memory until XtraBackup exits.

Problems with this approach:

  • Loading tables that are not required for rollback.
  • Unnecessary IO operations from reading SDI pages of tables.
  • Loading unnecessary tables increases the time required to –prepare.
  • Occupies memory and can lead to OOM.
  • Crashes the XtraBackup prepare operation if the backup directory contains a huge number of tables/IBD files.
  • A node joining the PXC cluster requires more memory and takes a long time to join the cluster.

Why did XtraBackup load tables as ‘non-evictable’? Can’t we just load them as evictable to solve the problem? Let’s say a table is evicted and has to be loaded again. How will XtraBackup know the tablespace (IBD) that contains the evicted table? It must scan every IBD again to find the evicted table.

New design (from Percona XtraBackup 8.0.33-28)

To load tables as evictable, a relationship between the table_id and the tablespace(space_id) that contains the table should be established. It is done by scanning the B-tree pages of the data dictionary tables mysql.indexes and mysql.index_partitions

After this relation table_id→space_id is established, it will be used during transaction rollback. In this new design,  user tables are loaded only if there is a transaction rollback on them.

The new design is as follows:

Tables from the cache are evicted when the cache size limit is reached or by the background master thread.

Benefits of the new design, xtrabackup –prepare:

  1. Uses less memory
  2. Uses less IO
  3. Faster prepare
  4. Completes successfully even with a huge number of tables.
  5. A node completes the SST process faster and joins the PXC cluster quickly.
  6. A node requires less memory to join the PXC cluster.

Benchmarks

Percona XtraBackup benchmarks

xtrabackup –prepare on backup directory of other sizes like 10K, 50K, 100K, and 250K tables. The performance improvement is as follows:

Conclusion

As you can see, from Percona XtraBackup 8.0.33-28, xtrabackup –prepare is faster and memory efficient with the dictionary cache. The improvement will depend on the number of tablespace files (IBDs) in the backup directory. The time taken for a new node to join the PXC Cluster is also significantly reduced as the SST process will complete faster.

Percona XtraBackup is a 100% open source backup solution for all versions of Percona Server for MySQL and MySQL that performs online non-blocking, tightly compressed, highly secure full backups on transactional systems.

 

Try Percona XtraBackup today

Subscribe
Notify of
guest

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Frederic Descamps

Hello !
Nice blog post, I’ve some comments/questions:

  1. Even if you have table_id -> space_id mapping, you would need to open all the ibd files to know the space_id. The saving you get is reading/processing SDI info, Right?
  2. “It is done by scanning the B-tree pages of the data dictionary tables mysql.indexes and mysql.index_partitions” –  How about the inflight transaction on these tables. IIUC, they aren’t rolled back yet, right?
  3. “How will XtraBackup know the tablespace (IBD) that contains the evicted table? It must scan every IBD again to find the evicted table.” – How about creating a mapping from space_id to file name when you have already opened it once. So next open would be just looking into this map and get the file to be opened.

Cheers

Jean-François Gagné

What is done by Percona Server for MySQL on crash recovery ? Does it have the same problem (table_id to tablespace mapping), does is also have the memory problem, or does it also load scan the data dictionary ?

An alternative solution could have been to build this mapping at the backup phase: as all ibd files are read, it could be the time to extract the mapping for later use in prepare. Does the new design has up-sides on this alternative design ?