In this article, we will demonstrate how to achieve asynchronous replication automatic source failover when our replica is a Percona Server for MySQL (PS) and the source is a Percona XtraDB Cluster (PXC) cluster, using virtual IP (VIP) managed by Keepalived.
Let us consider our architecture below with async replication from PXC to Percona Server for MySQL:
1 2 3 4 5 | PXC PS-MySQL ============== ============== node1 +----> node4 node2 | node3 -----+ |
Our goal is to set node3 as the async replication primary source. Then, upon failure, VIP can move to node2 and then node1 when node2 also fails. Once node3 becomes available again, the virtual IP should come back to node3.
Why not use the below MySQL built-in functionality instead of Keepalived?
- Asynchronous Replication Connection Failover – Automatic source list
- Asynchronous_connection_failover_add_managed
- managed_type: The type of managed service that the asynchronous connection failover mechanism must provide for this server. The only value currently accepted is GroupReplication.
- Asynchronous_connection_failover_add_managed
- Automatic Asynchronous Replication Connection Failover
- It stays connected to whatever host it was connected to, which will not work if PXC wsrep_local_state_comment is not in a “Synced” state like when we encountered this bug that it became an “inconsistent” state and the replica connected to it will not failover to other healthy PXC node. In this case PXC, clustercheck used by Keepalived can help us check the health of a node for proper failover.
Process
Assumptions:
- We already have a PXC cluster to PS async replication working with the below IPs:
1 2 3 4 5 6 7 8 | | MySQL Hostname | IP | Node | Remarks | server_id ---------+------------------+---------+----------|----- el9-171 | 192.168.122.171 | node1 | PXC | 171 el9-172 | 192.168.122.172 | node2 | PXC | 172 el9-173 | 192.168.122.173 | node3 | PXC | 173 el9-174 | 192.168.122.174 | node4 | PS | 174 | 192.168.122.170 | | VIP-PXC | |
- We already have PXC-Keepalived set-up shown above with VIP that we will dedicate for replication use. The setup guide is in the blog Using Keepalived for HA on top of Percona XtraDB Cluster.
- Replica user was granted privilege from any of the PXC nodes.
Prerequisites:
- Enable log_replica_updates on all PXC nodes to ensure that all the transactions from all nodes will be written to all nodes’ binary log, making anynode qualified to act as async replication source.
- GTID enables replication for automatic replication positioning and correct failover.
Steps:
1) Edit then verify the priority with the highest to lowest value from node3, node2, node1:
1 2 3 4 5 6 7 8 | node1> egrep 'priority' /etc/keepalived/keepalived.conf priority 101 node2> egrep 'priority' /etc/keepalived/keepalived.conf priority 102 node3> egrep 'priority' /etc/keepalived/keepalived.conf priority 103 |
Sample configuration from node1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | node1> cat /etc/keepalived/keepalived.conf vrrp_script chk_pxc { script "/usr/bin/clustercheck" interval 1 user mysql } vrrp_instance PXC_as_async_master { state MASTER interface ens2 virtual_router_id 51 priority 101 preempt virtual_ipaddress { 192.168.122.170 } track_script { chk_pxc } } |
2) Restart Keepalived on all nodes and check the status.
1 | shell> systemctl restart keepalived |
Check:
1 2 3 4 5 6 7 8 9 10 11 12 | node1> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE node2> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:23:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE node3> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:23:03 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering MASTER STATE node3> ip addr show dev ens2 | grep 'inet ' inet 192.168.122.173/24 brd 192.168.122.255 scope global noprefixroute ens2 inet 192.168.122.170/32 scope global ens2 |
We can see that node3 is the PRIMARY with VIP while the rest are in the BACKUP state.
3) Simulate a running write application.
1 | node1> (set -e; while true; do mysql db1 -e"INSERT INTO t(s) VALUES('a');" ; sleep 5; done) |
4) Check the replica’s current state using the below bash function:
1 2 3 | node4> f_replica_status(){ mysql -e'SHOW REPLICA STATUSG' | sed '/,$/N;s/n/ /' | egrep 'Source_Host|Source_Log_File|Replica_IO_Running|Replica_SQL_Running|Last_IO_Error:|Last_SQL_Error:|Auto_Position|_Gtid_Set|Source_Server_Id|Source_UUID|Seconds_Behind_Source|Exec_Source_Log_Pos|Read_Source_Log_Pos' } |
Check replica status:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | node4> f_replica_status Source_Host: 192.168.122.173 Source_Log_File: s173.000036 Read_Source_Log_Pos: 102117 Relay_Source_Log_File: s173.000036 Replica_IO_Running: Yes Replica_SQL_Running: Yes Exec_Source_Log_Pos: 102117 Seconds_Behind_Source: 0 Last_IO_Error: Last_SQL_Error: Source_Server_Id: 173 Source_UUID: f81e1079-7f00-11ee-849e-525400146f98 Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53752-53859 Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53859, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222 Auto_Position: 1 |
5) Reconfigure the replica to use the VIP(192.168.122.170):
1 2 3 | node4-mysql> STOP REPLICA; node4-mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.122.170'; node4-mysql> START REPLICA; |
6) Verify the replica. Source_Host is now set to VIP:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | node4> f_replica_status Source_Host: 192.168.122.170 Source_Log_File: s173.000036 Read_Source_Log_Pos: 107494 Relay_Source_Log_File: s173.000036 Replica_IO_Running: Yes Replica_SQL_Running: Yes Exec_Source_Log_Pos: 107494 Seconds_Behind_Source: 0 Last_IO_Error: Last_SQL_Error: Source_Server_Id: 173 Source_UUID: f81e1079-7f00-11ee-849e-525400146f98 Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53878 Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53878, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222 Auto_Position: 1 |
7) Test failover
7.1) Stop mysqld
1 | node3> mysqladmin shutdown |
7.2) Check the Keepalive state:
1 2 3 4 5 6 7 8 | node1> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE node2> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:35:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering MASTER STATE node3> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:35:02 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering FAULT STATE |
Node2 is the new PRIMARY while node3 changed to FAULT state.
7.3) Check the VIP transferred to node2:
1 2 3 | node2> ip addr show dev ens2 | grep 'inet ' inet 192.168.122.172/24 brd 192.168.122.255 scope global noprefixroute ens2 inet 192.168.122.170/32 scope global ens2 |
7.4) Check the replica. You may have to wait for at least the value of SOURCE_CONNECT_RETRY, which is, by default, 60 seconds.
The Source_Server_Id, Source_Log_File, Relay_Source_Log_File, and Source_UUID have changed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | node4> f_replica_status Source_Host: 192.168.122.170 Source_Log_File: s172.000033 Read_Source_Log_Pos: 198793 Relay_Source_Log_File: s172.000033 Replica_IO_Running: Yes Replica_SQL_Running: Yes Exec_Source_Log_Pos: 198793 Seconds_Behind_Source: 0 Last_IO_Error: Last_SQL_Error: Source_Server_Id: 172 Source_UUID: 6534a3eb-77d3-11ee-9870-52540028fd18 Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53973 Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53973, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222 Auto_Position: 1 |
The replica now has a successful failover to the new source (node2).
When node3 comes back online again, the VIP will also move back to node3 since it has the highest Keepalived priority, making it the asynchronous replication source again.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
Clever! Great article, (haven’t test it yet). It never occurred to me to use VIP and Keepalived as a solution for this setup. Eager to give it a try soon!