Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Hello friends,

In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.

This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:

PS-primary-1=192.168.0.14 [RW]
  |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)
  |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

PS-primary-1=192.168.0.14 [RW]

|___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)

|___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

[RW] means Read/Write access.

[R] means Read Only access.

We received an alert of this kind on PS-primary-2:

Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

1 2	Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

1	mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

Or by injecting an empty transaction.

(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)

Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.

Therefore, we executed the pt-table-checksum in the following way:

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T12:40:31      0      1        6          1       1       0   0.193 foo.persons

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...

Starting checksum ...

TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE

07-28T12:40:31 0 1 6 1 1 0 0.193 foo.persons

(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)

(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)

At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 08:40:31 | foo | persons |          5 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 09:53:10 | foo | persons |          4 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;

+---------------------+-----+---------+------------+--------+

+---------------------+-----+---------+------------+--------+

| 2023-07-28 08:40:31 | foo | persons | 5 | 1 |

+---------------------+-----+---------+------------+--------+

1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;

+---------------------+-----+---------+------------+--------+

+---------------------+-----+---------+------------+--------+

| 2023-07-28 09:53:10 | foo | persons | 4 | 1 |

+---------------------+-----+---------+------------+--------+

1 row in set (0.00 sec)

The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

A DELETE statement has appeared. How is it possible?

As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T13:08:12      0      1        5          1       1       0   0.147 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...

Starting checksum ...

TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE

07-28T13:08:12 0 1 5 1 1 0 0.147 foo.persons

That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;

+---------------------+-----+---------+------------+--------+

+---------------------+-----+---------+------------+--------+

| 2023-07-28 10:03:00 | foo | persons | 6 | 1 |

+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;

+---------------------+-----+---------+------------+--------+

+---------------------+-----+---------+------------+--------+

| 2023-07-28 10:03:00 | foo | persons | 6 | 1 |

+---------------------+-----+---------+------------+--------+

Let’s see what data the table currently contains:

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14

+----------+-----------+-----------+---------+------------+---------------------+

+----------+-----------+-----------+---------+------------+---------------------+

| 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 |

| 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 |

| 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 |

| 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 |

| 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 |

| 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 |

+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59

+----------+-----------+-----------+---------+------------+---------------------+

+----------+-----------+-----------+---------+------------+---------------------+

| 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 |

| 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 |

| 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 |

| 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 |

| 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 |

+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99

+----------+-----------+-----------+---------+------------+---------------------+

+----------+-----------+-----------+---------+------------+---------------------+

| 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 |

| 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 |

| 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 |

| 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 |

+----------+-----------+-----------+---------+------------+---------------------+

As you can see, there is data drift everywhere.

And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list

Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):

A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
A statement was executed directly on the replica
This can happen if the replica was not in super_read_only and a super user executed
This can happen if the replica was not in read_only
A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
A primary server is not configured for full ACID compliance, and it crashed
At some point, the primary was not configured for row-based replication (even briefly)

These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.

Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.

Therefore, note that it asks us to delete rows, which is not correct. So, what to do? After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:

We need to add the row with ID=3 to the instances where it’s missing.
We need to update the row with ID=1 in the instances where it’s missing.
We need to add the row with ID=7 to the instances where it’s missing.
We DO NOT need to delete any rows; no DELETES are necessary.

Important note: The client informed us that the data that exists only in one instance and not in the others is necessary. The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.

How do we achieve this, then?

Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.

We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here

MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):

Try to insert the new row into the table
While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
1. Delete from the table the conflicting row that has the duplicate key value
2. Try again to insert the new row into the table

That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13');
Query OK, 1 row affected (0.01 sec)

1 2	PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13'); Query OK, 1 row affected (0.01 sec)

We connected to PS-primary-2, and indeed, we found the duplicate key error:

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG
PAGER set to 'grep "Last_Err"'
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840
1 row in set (0.01 sec)

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG

PAGER set to 'grep "Last_Err"'

Last_Errno: 1062

Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840

1 row in set (0.01 sec)

The commands we should execute would be the following:

set sql_log_bin=0;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

set sql_log_bin=0;

REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');

REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');

REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

However, as we observed, the first one will fail due to a duplicate key. So, how do we resolve this? The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:

PS-primary-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-1 > Bye


PS-primary-2 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-2 > Bye


PS-replica-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > set sql_log_bin=0;

Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');

Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');

Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

Query OK, 1 row affected (0.01 sec)

PS-primary-1 > Bye

PS-primary-2 > set sql_log_bin=0;

Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');

Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');

Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

Query OK, 1 row affected (0.01 sec)

PS-primary-2 > Bye

PS-replica-1 > set sql_log_bin=0;

Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');

Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');

Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;

Query OK, 0 rows affected (0.00 sec)

Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:32      0      0        7          0       1       0   0.185 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:50      0      0        7          0       1       0   0.148 foo.persons

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14

+----------+-----------+-----------+---------+------------+---------------------+

+----------+-----------+-----------+---------+------------+---------------------+

| 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 |

| 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 |

| 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 |

| 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 |

| 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 |

| 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 |

| 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 |

+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59

+----------+-----------+-----------+---------+------------+---------------------+

+----------+-----------+-----------+---------+------------+---------------------+

| 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 |

| 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 |

| 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 |

| 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 |

| 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 |

| 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 |

| 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 |

+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99

+----------+-----------+-----------+---------+------------+---------------------+

+----------+-----------+-----------+---------+------------+---------------------+

| 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 |

| 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 |

| 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 |

| 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 |

| 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 |

| 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 |

| 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 |

+----------+-----------+-----------+---------+------------+---------------------+

Checking if all tables can be checksummed ...

Starting checksum ...

TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE

07-28T14:12:32 0 0 7 0 1 0 0.185 foo.persons

Checking if all tables can be checksummed ...

Starting checksum ...

TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE

07-28T14:12:50 0 0 7 0 1 0 0.148 foo.persons

Conclusion

Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.

Remember that if you have any questions or concerns, you can always contact us.

Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.

Let’s stay synchronized, and I’ll see you in the next blog!

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

MySQL 8.4 First Peek

LDAP Authentication in PgBouncer Through PAM

Benchmarking MongoDB Performance on Kubernetes

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

MySQL 8.4 First Peek

LDAP Authentication in PgBouncer Through PAM

Benchmarking MongoDB Performance on Kubernetes

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation