Impact of DDL Operations on Aurora MySQL ReadersRecently I came across an interesting investigation about long-running transactions getting killed on an Aurora Reader instance. In this article, I will explain why it is advisable to avoid long-running transactions on Aurora readers when executing frequent DDL operations on the Writer, or at least be aware of how a DDL can impact your Aurora readers.

Aurora uses a shared volume often called a cluster volume that manages the data for all the DB instances which are part of the cluster. Here DB instances could be a single Aurora instance or multiple instances (Writer and Aurora Read Replicas) within a cluster.

Aurora replicas connect to the same storage volume as the primary DB instance and support only read operations. So if you add a new Aurora replica it would not make a new copy of the table data and instead will connect to the shared cluster volume which contains all the data.

This could lead to an issue on replica instances when handling the DDL operations.

Below is one such example.

 

Start a transaction on reader:

 

While the transaction is ongoing on the reader, execute any DDL against the same table on the writer

 

Check status on reader, the transaction would be terminated forcefully

 

Now, let’s see what happens when there is a backup happening from a reader node and the writer receives a DDL for that particular table that is being backed up.

Take a logical backup of a table using mydumper:

While the backup is ongoing on the reader, execute any DDL against the same table on the writer.

Check the status of the backup

So what is the issue?

As stated above, Aurora does not use binary log-based replication to replicate data to the readers. The underlying storage is the same for all the instances (writer+readers) within a cluster and Aurora handles it with let’s say “magic”.

Now, because of this “magic” in Aurora, when you perform any DDL operation on writer instance, the reader instances are forced to terminate any long-running transactions so as to acquire the metadata lock so that DDL operation can continue on writer instance.

Hence, if you are using Aurora replicas for logical backups (mysqldump/mydumper) or if you are running some long-running jobs on the reader instance you may encounter the issue mentioned above.

To understand this better let’s see what happens when we perform any DDL operation in a binary log-based replication environment and in the Aurora replication environment. Following are the high-level steps when any DDL gets executed.

Binary log-based replication:

  • On the primary, ALTER TABLE will try to acquire the metadata lock
  • Once the lock is acquired the ALTER TABLE progresses
  • Once the ALTER TABLE operation completes, the DDL statement will be written to the binary log
  • On the replicas, the IO thread will copy this event to the local relay log
  • The SQL thread will apply the query from the relay log
  • On the replica, it will also acquire the global metadata lock
  • Once the lock is acquired, the ALTER TABLE will starts execution on the replica

Aurora replication:

  • On the writer, the ALTER TABLE will try to acquire the metadata lock
  • At the same time, it will check if there is any open transaction in any of the reader nodes, if so it will kill those transactions forcefully
  • Once the metadata lock is acquired, the ALTER TABLE progresses
  • After the ALTER TABLE completes, the modified structure will be visible to the replicas because of the same underlying storage

What are the issues?

  1. If you are performing frequent DDL operations in your database, it is not recommended to take logical backups from Aurora Reader.
  2. If transactions are running for a long time they may get killed.

What is the solution?

Create an external replica of the Aurora cluster using binary log-based replication. This replica can be used to take logical backups or to execute some long-running queries that will not be interrupted by the DDL operation on the Aurora writer instance.

You may follow the Percona blog to create an external replica from Aurora using MyDumper or review the AWS documentation page.

Subscribe
Notify of
guest

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Rahul

Interesting, do you have insight if the behavior is same in earlier Aurora versions i.e. 1.x nd 2.x ?

Prakash Sharma

Good Blog to know about Aurora Replica’s behavior.

Vivek Buddhadev

Thanks for the detailed explanation. It’s usefull.

Sam

This is not a satisfactory behaviour of this cloud vendor product. Moreover, you have mentioned that in classic mysql, metadata lock has been taken during alter which is not correct. It has been taken only during the beginning and ending of the statement. This is to capture the snapshot and have consistent alter.