Importance of Delay Replica in Data Recovery: Recover Deleted Records

What is a Delay Replica and how does it help?

MySQL Replication is useful, easy to set up, and used for very different purposes. For example:

split reads and writes
run data mining or reporting processes on them
disaster recovery

To learn more, check out How Does MySQL Replication Work?

It’s important to mention that a replication server is not a backup by itself. A mistake on the source, for example, a DROP DATABASE or an UPDATE without a WHERE clause will be replicated nearly instantly to all replica servers. Simply having a replica is not going to be helpful here. How can we avoid that kind of mistake? By having a replica server that intentionally lags behind.

We can never avoid human error in database infrastructure operations. But rollback to the last known good condition from delayed Source/Replica is the best thing recommended during the entire database infrastructure corruption scenarios.

Delayed replication can be used for several purposes:

To test how the system behaves when there is a lag. For example, in an application, a lag might be caused by a heavy load on the replica. However, it can be difficult to generate this load level. Delayed replication can simulate the lag without having to simulate the load. It can also be used to debug conditions related to a lagging replica.
To inspect what the database looked like in the past, without having to reload a backup. For example, by configuring a replica with a delay of one week, if you then need to see what the database looked like before the last few days’ worth of development, the delayed replica can be inspected.
In our case, as we have six hours replication delay, we can recover the non-problematic state of the table by going back to six hours delayed replica in case there was the wrong DML on the source.

Testing the Delay Replica which can help us recover the deleted record

Let’s understand the situation first: Someone deleted the data on a Percona XtraDB Cluster 8 (PXC) server accidentally. As per architecture, we always configure a delayed replica. We have stopped the replication on the delayed replica and now we will restore the deleted records.

From the initial investigation, we came to know from the application operation team that the below query was executed.

delete from Schema_g2.per_sch order by id desc;

1	delete from Schema_g2.per_sch order by id desc;

Almost 20k records were deleted. Let us do our pre-reqs and initial investigation based on the requirements we have.

In this article, PXC-8 is our source and pxc-backup is our delayed backup Replica. Click through to learn about our product Percona XtraDB Cluster and our backup solution Percona XtraBackup.

Percona XtraDB Cluster

All records for Schema_g2.per_sch were deleted from PXC-8 by mistake.
We have stopped the delayed replica pxc-backup to retrieve these records. All deleted records are present here right now. Below are the number of records we have until replication was stopped.

Pxc-backup > select count(*) from Schema_g2.per_sch;
+----------+
| count(*) |
+----------+
|    21762 |
+----------+

Pxc-backup > select count(*) from Schema_g2.per_sch;

+----------+

| count(*) |

+----------+

| 21762 |

+----------+

At this point, we have already stopped the backup replica.

After deletion, below are the new records inserted on the PXC-8 source for this table.

PXC-8 (none)> select count(*) from Schema_g2.per_sch;
+----------+
| count(*) |
+----------+
|      215 |
+----------+

PXC-8 (none)> select count(*) from Schema_g2.per_sch;

+----------+

| count(*) |

+----------+

| 215 |

+----------+

id column is auto_inc so new rows inserted have next values.

id int(11) NOT NULL AUTO_INCREMENT,

1	id int(11) NOT NULL AUTO_INCREMENT,

PXC-8 (none)> select * from Schema_g2.per_sch;
+--------+---------------+-----------+-------+
| id     | permission_id | school_id | grant |
+--------+---------------+-----------+-------+
| 178852 |           446 |     48887 |     0 |
| 178853 |           448 |     48887 |     0 |

...

PXC-8 (none)> select * from Schema_g2.per_sch;

+--------+---------------+-----------+-------+

+--------+---------------+-----------+-------+

| 178852 | 446 | 48887 | 0 |

| 178853 | 448 | 48887 | 0 |

...

Records on the backup server:

Pxc-backup > select * from Schema_g2.per_sch limit 10;
+-----+---------------+-----------+-------+
| id  | permission_id | school_id | grant |
+-----+---------------+-----------+-------+
|   0 |             0 |         0 |     0 |
| 105 |           426 |         1 |     0 |
| 429 |            89 |        16 |     0 |
| 431 |            93 |        16 |     0 |

...

| 178629 |           194 |     35758 |     0 |
| 178630 |           195 |     35758 |     0 |
| 178631 |          8239 |     35758 |     0 |
+--------+---------------+-----------+-------+

Pxc-backup > select * from Schema_g2.per_sch limit 10;

+-----+---------------+-----------+-------+

+-----+---------------+-----------+-------+

| 0 | 0 | 0 | 0 |

| 105 | 426 | 1 | 0 |

| 429 | 89 | 16 | 0 |

| 431 | 93 | 16 | 0 |

...

| 178629 | 194 | 35758 | 0 |

| 178630 | 195 | 35758 | 0 |

| 178631 | 8239 | 35758 | 0 |

+--------+---------------+-----------+-------+

A few records between 178631 and 178852 are missing on the backup node and the replica was stopped in between.
Exact position before the drop was executed on source: (Application team pinged this, we verified it as an UPDATE query before the DELETE).

START SLAVE UNTIL MASTER_LOG_FILE = 'PXC-8-bin.001611', MASTER_LOG_POS = 613996753;

1	START SLAVE UNTIL MASTER_LOG_FILE = 'PXC-8-bin.001611', MASTER_LOG_POS = 613996753;

From binlog we see below:

#210922 11:44:05 server id 8  end_log_pos 613996753 CRC32 0xee39f244    Query   thread_id=36995659      exec_time=0     error_code=0
SET TIMESTAMP=1632300245/*!*/;
BEGIN
/*!*/;
# at 613996753
#210922 11:44:05 server id 8  end_log_pos 613997049 CRC32 0x92aea136    Table_map: `Schema_g2`.`usr` mapped to number 109
# at 613997049
#210922 11:44:05 server id 8  end_log_pos 613997697 CRC32 0x1132b4ad    Update_rows: table id 109 flags: STMT_END_F
### UPDATE `Schema_g2`.`usr`

.......

#210922 11:44:04 server id 8  end_log_pos 613997872 CRC32 0xf41297bc    Query   thread_id=37823889      exec_time=0     error_code=0
SET TIMESTAMP=1632300244/*!*/;
BEGIN
/*!*/;
# at 613997872
#210922 11:44:04 server id 8  end_log_pos 613997943 CRC32 0xe341c435    Table_map: `Schema_g2`.`per_sch` mapped to number 664
# at 613997943
#210922 11:44:04 server id 8  end_log_pos 614006154 CRC32 0xd71595b1    Delete_rows: table id 664

#210922 11:44:05 server id 8 end_log_pos 613996753 CRC32 0xee39f244 Query thread_id=36995659 exec_time=0 error_code=0

SET TIMESTAMP=1632300245/*!*/;

BEGIN

/*!*/;

# at 613996753

#210922 11:44:05 server id 8 end_log_pos 613997049 CRC32 0x92aea136 Table_map: `Schema_g2`.`usr` mapped to number 109

# at 613997049

#210922 11:44:05 server id 8 end_log_pos 613997697 CRC32 0x1132b4ad Update_rows: table id 109 flags: STMT_END_F

### UPDATE `Schema_g2`.`usr`

.......

#210922 11:44:04 server id 8 end_log_pos 613997872 CRC32 0xf41297bc Query thread_id=37823889 exec_time=0 error_code=0

SET TIMESTAMP=1632300244/*!*/;

BEGIN

/*!*/;

# at 613997872

#210922 11:44:04 server id 8 end_log_pos 613997943 CRC32 0xe341c435 Table_map: `Schema_g2`.`per_sch` mapped to number 664

# at 613997943

#210922 11:44:04 server id 8 end_log_pos 614006154 CRC32 0xd71595b1 Delete_rows: table id 664

Steps to restore

Start replication on the backup server until the DROP statement.

START SLAVE UNTIL MASTER_LOG_FILE = 'PXC-8-bin.001611', MASTER_LOG_POS = 613996753;

1	START SLAVE UNTIL MASTER_LOG_FILE = 'PXC-8-bin.001611', MASTER_LOG_POS = 613996753;

Verify binlog position and that it is caught up till above and verify new record count on the backup server.

Verify replication is stopped again.

select count(*) from Schema_g2.per_sch;

1	select count(*) from Schema_g2.per_sch;

-Verify last id is < 178852

Use –where clause IF we notice any duplicate duplicates rows same as PXC-8 on the backup server.

<span style="font-weight: 400;">--where=“id < 178852”</span>

1	<span style="font-weight: 400;">--where=“id < 178852”</span>

Take a backup from the backup server with the below options to avoid dropping the newly added rows.

mysqldump -h backup-server-ip --single-transaction --skip-add-drop-table --no-create-info Schema_g2 per_sch > per_sch_backup.sql

1	mysqldump -h backup-server-ip --single-transaction --skip-add-drop-table --no-create-info Schema_g2 per_sch > per_sch_backup.sql

Verify no drops/created are present in the backup file.

Restore backup on source PXC-8:

mysql -h Source-server-ip Schema_g2 < per_sch_backup.sql

1	mysql -h Source-server-ip Schema_g2 < per_sch_backup.sql

Verify it completes ok and records are added back on source:

mysql -h Source-server-ip -e "select count(*) from Schema_g2.per_sch"

1	mysql -h Source-server-ip -e "select count(*) from Schema_g2.per_sch"

Verify records < 178852 and above also exist on the source.

Start replication on the backup server normally.

stop slave;

start slave;

stop slave;

start slave;

With loading the record on the source, it will get replicated to delay replica and other replicas and the data will be in sync again.

Conclusion

It is always recommended to have a Delay Replica in your architecture to avoid and quickly resolve such data losses.

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Importance of Delay Replica in Data Recovery: Recover Deleted Records

Testing the Delay Replica which can help us recover the deleted record

Steps to restore

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

MySQL 8.4 First Peek

LDAP Authentication in PgBouncer Through PAM

Trying out the PostgreSQL pg_tde Tech Preview Release

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Importance of Delay Replica in Data Recovery: Recover Deleted Records

Testing the Delay Replica which can help us recover the deleted record

Steps to restore

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

MySQL 8.4 First Peek

LDAP Authentication in PgBouncer Through PAM

Trying out the PostgreSQL pg_tde Tech Preview Release

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation