Keepalived for Source Failover: Percona XtraDB Cluster to Percona Server for MySQL

In this article, we will demonstrate how to achieve asynchronous replication automatic source failover when our replica is a Percona Server for MySQL (PS) and the source is a Percona XtraDB Cluster (PXC) cluster, using virtual IP (VIP) managed by Keepalived.

Let us consider our architecture below with async replication from PXC to Percona Server for MySQL:

PXC               PS-MySQL
==============    ==============
node1      +----> node4
node2      |       
node3 -----+

PXC PS-MySQL

============== ==============

node1 +----> node4

node2 |

node3 -----+

Our goal is to set node3 as the async replication primary source. Then, upon failure, VIP can move to node2 and then node1 when node2 also fails. Once node3 becomes available again, the virtual IP should come back to node3.

Why not use the below MySQL built-in functionality instead of Keepalived?

Asynchronous Replication Connection Failover – Automatic source list
- Asynchronous_connection_failover_add_managed
  - managed_type: The type of managed service that the asynchronous connection failover mechanism must provide for this server. The only value currently accepted is GroupReplication.

Automatic Asynchronous Replication Connection Failover
- It stays connected to whatever host it was connected to, which will not work if PXC wsrep_local_state_comment is not in a “Synced” state like when we encountered this bug that it became an “inconsistent” state and the replica connected to it will not failover to other healthy PXC node. In this case PXC, clustercheck used by Keepalived can help us check the health of a node for proper failover.

Process

Assumptions:

We already have a PXC cluster to PS async replication working with the below IPs:

                                                 | MySQL
Hostname | IP               | Node    | Remarks  | server_id
---------+------------------+---------+----------|-----
el9-171  | 192.168.122.171  | node1   | PXC      | 171
el9-172  | 192.168.122.172  | node2   | PXC      | 172
el9-173  | 192.168.122.173  | node3   | PXC      | 173
el9-174  | 192.168.122.174  | node4   | PS       | 174
         | 192.168.122.170  |         | VIP-PXC  |

| MySQL

Hostname | IP | Node | Remarks | server_id

---------+------------------+---------+----------|-----

el9-171 | 192.168.122.171 | node1 | PXC | 171

el9-172 | 192.168.122.172 | node2 | PXC | 172

el9-173 | 192.168.122.173 | node3 | PXC | 173

el9-174 | 192.168.122.174 | node4 | PS | 174

| 192.168.122.170 | | VIP-PXC |

We already have PXC-Keepalived set-up shown above with VIP that we will dedicate for replication use. The setup guide is in the blog Using Keepalived for HA on top of Percona XtraDB Cluster.
Replica user was granted privilege from any of the PXC nodes.

Prerequisites:

Enable log_replica_updates on all PXC nodes to ensure that all the transactions from all nodes will be written to all nodes’ binary log, making anynode qualified to act as async replication source.
GTID enables replication for automatic replication positioning and correct failover.

Steps:

1) Edit then verify the priority with the highest to lowest value from node3, node2, node1:

node1> egrep 'priority' /etc/keepalived/keepalived.conf
  priority 101

node2> egrep 'priority' /etc/keepalived/keepalived.conf
  priority 102

node3> egrep 'priority' /etc/keepalived/keepalived.conf
priority 103

node1> egrep 'priority' /etc/keepalived/keepalived.conf

priority 101

node2> egrep 'priority' /etc/keepalived/keepalived.conf

priority 102

node3> egrep 'priority' /etc/keepalived/keepalived.conf

priority 103

Sample configuration from node1:

node1> cat /etc/keepalived/keepalived.conf
vrrp_script chk_pxc {
  script  "/usr/bin/clustercheck"
  interval  1
  user    mysql
}
vrrp_instance PXC_as_async_master {
  state           MASTER
  interface       ens2
  virtual_router_id 51
  priority        101
  preempt
  virtual_ipaddress {
      192.168.122.170
  }
  
  track_script {
      chk_pxc
  }
}

node1> cat /etc/keepalived/keepalived.conf

vrrp_script chk_pxc {

script "/usr/bin/clustercheck"

interval 1

user mysql

}

vrrp_instance PXC_as_async_master {

state MASTER

interface ens2

virtual_router_id 51

priority 101

preempt

virtual_ipaddress {

192.168.122.170

}

track_script {

chk_pxc

}

2) Restart Keepalived on all nodes and check the status.

shell> systemctl restart keepalived

1	shell> systemctl restart keepalived

Check:

node1> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1
Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE

node2> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1
Nov 13 03:23:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE

node3> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1
Nov 13 03:23:03 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering MASTER STATE

node3> ip addr show dev ens2 | grep 'inet '
inet 192.168.122.173/24 brd 192.168.122.255 scope global noprefixroute ens2
inet 192.168.122.170/32 scope global ens2

Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE

Nov 13 03:23:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE

Nov 13 03:23:03 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering MASTER STATE

node3> ip addr show dev ens2 | grep 'inet '

inet 192.168.122.173/24 brd 192.168.122.255 scope global noprefixroute ens2

inet 192.168.122.170/32 scope global ens2

We can see that node3 is the PRIMARY with VIP while the rest are in the BACKUP state.

3) Simulate a running write application.

node1> (set -e; while true; do mysql db1 -e"INSERT INTO t(s) VALUES('a');" ; sleep 5; done)

1	node1> (set -e; while true; do mysql db1 -e"INSERT INTO t(s) VALUES('a');" ; sleep 5; done)

4) Check the replica’s current state using the below bash function:

node4> f_replica_status(){
mysql -e'SHOW REPLICA STATUSG' | sed '/,$/N;s/n/ /' | egrep 'Source_Host|Source_Log_File|Replica_IO_Running|Replica_SQL_Running|Last_IO_Error:|Last_SQL_Error:|Auto_Position|_Gtid_Set|Source_Server_Id|Source_UUID|Seconds_Behind_Source|Exec_Source_Log_Pos|Read_Source_Log_Pos'
}

node4> f_replica_status(){

}

Check replica status:

node4> f_replica_status
                  Source_Host: 192.168.122.173
              Source_Log_File: s173.000036
          Read_Source_Log_Pos: 102117
        Relay_Source_Log_File: s173.000036
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
          Exec_Source_Log_Pos: 102117
        Seconds_Behind_Source: 0
                Last_IO_Error:
               Last_SQL_Error:
             Source_Server_Id: 173
                  Source_UUID: f81e1079-7f00-11ee-849e-525400146f98
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53752-53859
            Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53859, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222
                Auto_Position: 1

node4> f_replica_status

Source_Host: 192.168.122.173

Source_Log_File: s173.000036

Read_Source_Log_Pos: 102117

Relay_Source_Log_File: s173.000036

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

Exec_Source_Log_Pos: 102117

Seconds_Behind_Source: 0

Last_IO_Error:

Last_SQL_Error:

Source_Server_Id: 173

Source_UUID: f81e1079-7f00-11ee-849e-525400146f98

Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates

Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53752-53859

Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53859, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222

Auto_Position: 1

5) Reconfigure the replica to use the VIP(192.168.122.170):

node4-mysql> STOP REPLICA;
node4-mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.122.170';
node4-mysql> START REPLICA;

node4-mysql> STOP REPLICA;

node4-mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.122.170';

node4-mysql> START REPLICA;

6) Verify the replica. Source_Host is now set to VIP:

node4> f_replica_status
                  Source_Host: 192.168.122.170
              Source_Log_File: s173.000036
          Read_Source_Log_Pos: 107494
        Relay_Source_Log_File: s173.000036
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
          Exec_Source_Log_Pos: 107494
        Seconds_Behind_Source: 0
                Last_IO_Error:
               Last_SQL_Error:
             Source_Server_Id: 173
                  Source_UUID: f81e1079-7f00-11ee-849e-525400146f98
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53878
            Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53878, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222
                Auto_Position: 1

node4> f_replica_status

Source_Host: 192.168.122.170

Source_Log_File: s173.000036

Read_Source_Log_Pos: 107494

Relay_Source_Log_File: s173.000036

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

Exec_Source_Log_Pos: 107494

Seconds_Behind_Source: 0

Last_IO_Error:

Last_SQL_Error:

Source_Server_Id: 173

Source_UUID: f81e1079-7f00-11ee-849e-525400146f98

Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates

Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53878

Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53878, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222

Auto_Position: 1

7) Test failover

7.1) Stop mysqld

node3> mysqladmin shutdown

1	node3> mysqladmin shutdown

7.2) Check the Keepalive state:

node1> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1
Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE

node2> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1
Nov 13 03:35:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering MASTER STATE

node3> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1
Nov 13 03:35:02 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering FAULT STATE

Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE

Nov 13 03:35:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering MASTER STATE

Nov 13 03:35:02 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering FAULT STATE

Node2 is the new PRIMARY while node3 changed to FAULT state.

7.3) Check the VIP transferred to node2:

node2> ip addr show dev ens2 | grep 'inet '
inet 192.168.122.172/24 brd 192.168.122.255 scope global noprefixroute ens2
inet 192.168.122.170/32 scope global ens2

node2> ip addr show dev ens2 | grep 'inet '

inet 192.168.122.172/24 brd 192.168.122.255 scope global noprefixroute ens2

inet 192.168.122.170/32 scope global ens2

7.4) Check the replica. You may have to wait for at least the value of SOURCE_CONNECT_RETRY, which is, by default, 60 seconds.

The Source_Server_Id, Source_Log_File, Relay_Source_Log_File, and Source_UUID have changed.

node4> f_replica_status
                  Source_Host: 192.168.122.170
              Source_Log_File: s172.000033
          Read_Source_Log_Pos: 198793
        Relay_Source_Log_File: s172.000033
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
          Exec_Source_Log_Pos: 198793
        Seconds_Behind_Source: 0
                Last_IO_Error: 
               Last_SQL_Error: 
             Source_Server_Id: 172
                  Source_UUID: 6534a3eb-77d3-11ee-9870-52540028fd18
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53973
            Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53973, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222
                Auto_Position: 1

node4> f_replica_status

Source_Host: 192.168.122.170

Source_Log_File: s172.000033

Read_Source_Log_Pos: 198793

Relay_Source_Log_File: s172.000033

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

Exec_Source_Log_Pos: 198793

Seconds_Behind_Source: 0

Last_IO_Error:

Last_SQL_Error:

Source_Server_Id: 172

Source_UUID: 6534a3eb-77d3-11ee-9870-52540028fd18

Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates

Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53973

Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53973, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222

Auto_Position: 1

The replica now has a successful failover to the new source (node2).

When node3 comes back online again, the VIP will also move back to node3 since it has the highest Keepalived priority, making it the asynchronous replication source again.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Try Percona Distribution for MySQL today!

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Eric

5 months ago

Clever! Great article, (haven’t test it yet). It never occurred to me to use VIP and Keepalived as a solution for this setup. Eager to give it a try soon!

Last edited 5 months ago by Eric

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Keepalived for Source Failover: Percona XtraDB Cluster to Percona Server for MySQL

Why not use the below MySQL built-in functionality instead of Keepalived?

Process

Assumptions:

Prerequisites:

Steps:

Related

Related Blog Articles

RECOMMENDED ARTICLES

GCache and Record-Set Cache Encryption in Percona XtraDB Cluster – Part One

End of Life Alert: CentOS 7 and Debian 10 Approaching EOL

Administering a Patroni Managed PostgreSQL Cluster

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Keepalived for Source Failover: Percona XtraDB Cluster to Percona Server for MySQL

Why not use the below MySQL built-in functionality instead of Keepalived?

Process

Assumptions:

Prerequisites:

Steps:

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

GCache and Record-Set Cache Encryption in Percona XtraDB Cluster – Part One

End of Life Alert: CentOS 7 and Debian 10 Approaching EOL

Administering a Patroni Managed PostgreSQL Cluster

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation