Rajendra Gupta
sql server always on

SQL Server Always On Availability Group Data Resynchronization

May 9, 2019 by

In my previous article Data Synchronization in SQL Server Always On Availability Group, we described a scenario where if a secondary replica goes down in synchronous data commit mode, SQL Server Always on Availability group changes to asynchronous data commit mode. It ensures that users can get their transaction commit irrespective of waiting for a secondary replica to come online.

sql server always on

In this article, we will explore the following scenario related with SQL Server Always on Availability group.

  • Synchronous Secondary Replica resynchronization process SQL Server Always on Availability Group
  • Automatic failover in case of Primary Replica goes down
  • Manual planned Failover
  • Force Manual failover with data loss

Synchronous Secondary Replica resynchronization process SQL Server Always on Availability Group

Suppose we have three SQL Always On replicas as follows.

  • Two replicas in DC having synchronous data commit
  • One replica is DR having asynchronous data commit

sql server always on example

In the following screenshot, we can see two nodes are down. I have stopped SQL services on both secondary replicas (RDP to replica server and Open SQL Server Configuration Manager and stop SQL Service)

  • Synchronous data commit – Down, It switches to Asynchronous data commit
  • Asynchronous data commit – Down

sql server always on example

Once the secondary synchronous data replica is offline, it switches database status to Not Synchronizing. In this mode, Primary replica does not wait for the acknowledgement from the secondary replica.

In the following screenshot, we can see that the status of both secondary replicas is Not Synchronizing.

LSN status

In the following query, we use DMV sys.dm_hadr_database_replica_states to get details about LSN on primary and secondary replica.

In SQL Server Always on Availability Group, we can use last_commit_lsn to check the commit LSN on all the available nodes. We can see that end_of_log_lsn is similar when the always on the group were in a synchronized state.

LSN information

Now, connect to primary replica and do some DML transactions. It generates transaction log records, but those records could not be sent to secondary replica because of their unavailability.

Again execute the query to check last_commit_lsn. We can see that the primary replica is ahead of the secondary replica. You can also check end_of_log_lsn column, and it indicates the last log LSN on respective replicas.

LSN information

Bring the services online (take RDP to corresponding replica server and Open SQL Server Configuration Manager and Start SQL Service). Once the SQL Services are up on secondary replica, it establishes a connection with the primary replica. Secondary replica sends end_of_log_lsn to the primary replica. Previously we noticed that SQL Server Always on Availability Group changes to the asynchronous mode and commit the records in primary replica only. It commits the transactions but does not truncate the logs until a secondary replica is in sync again. Primary replica sends all transaction blocks starting from end_of_log_lsn to secondary replica.

sql always on example

Secondary replica receives these transaction blocks and hardens those transactions. Data Synchronization mode is still asynchronous. It also changes synchronization state from Not Synchronizing to Synchronizing.

The secondary replica sends an acknowledgement for transaction blocks and keeps doing this process until last_hardened_lsn of both primary and secondary replica is the same.

LSN example

At this point, SQL Server Always on Availability group data synchronization changes to synchronous data commit from Asynchronous data commit.

Note: Asynchronous data commit mode remains the same. It does not change to synchronous data commit automatically. SQL Server again starts waiting for the acknowledgement from secondary replica for all transactions.

We have one secondary DR replica as well. In this case, the first connection gets established between primary and secondary replica and Primary replica send transaction log after end_of_log_lsn of the secondary replica. It changes the status to synchronizing from not synchronized.

LSN information

sql always on availability groups example

Automatic failover in case of Primary Replica goes down

We can achieve automatic failure in case of loss of primary replica. Automatic failover could occur in case of synchronous data commit only.

In SQL Server 2012 and 2014, if the primary instance is available and healthy, it does not perform automatic failover. It does not check the individual database in an availability group. In SQL Server 2016, we can have availability group health monitoring as well. We can configure that if a database in an availability group becomes unavailable, it can also trigger an automatic failover.

Automatic failover Steps

  • SQL Server Always On Availability group status for primary replica changes to Disconnected from Synchronized
  • Secondary replica starts taking Primary role in Availability group. It rolls forward any pending transactions in the recovery queue and hardens them
  • The secondary replica works as a new primary replica. It rolls back any uncommitted transactions and database become available for the users. If we are using listener configuration in Always On, it automatically points all connections to the new primary replica. It also starts asynchronous data commit and commit transactions on Primary replica only
  • Later, once the secondary replica becomes available and connects with Primary replica, it follows the steps we explored in the previous section and start data synchronizing process. Once the databases are in sync, new primary replica starts synchronization data commit with the new secondary replica node. It does not do automatic failover again to change the status of an old primary replica (current secondary replica) to the current primary replica

Manual planned Failover

We can perform planned manual failure to secondary transition replica to the primary replica. We can perform planned failover using SSMS or t-SQL.

  • Both the primary and secondary replica should be running in synchronous data commit mode
  • Status of SQL Server always on availability group databases should be Synchronized

We can check whether the database is ready for manual failover using the is_failover_ready column of sys.dm_hadr_database_replica_cluster_states DMV.

In the following screenshot, we can see the bottom two rows are showing failover ready. It is because we have two replicas in synchronous data commit mode. We have another DR secondary replica in asynchronous data commit mode; therefore, it does not show is_failover_ready value as one for this replica. We should initiate a planned manual failover from the secondary replica.

Replica information

Manual planned failover actions

  • Once a user initiates manual planned failover secondary replica database roll forward pending logs and bring it online
  • It also rolls back any uncommitted transactions to keep the database in a consistent state
  • Secondary replica takes the role of a new Primary replica and starts to synchronize with the current secondary replica (old primary replica)
  • The database status remains NOT SYNCHRONIZING until synchronization happens between primary and secondary replica
  • It changes database status to Synchronized

Force Manual failover with data loss

We can do a manual failover to any secondary replica including asynchronous data commit replica. Usually, we should use forced failover for disaster recovery purpose only. Once we initiate a forced failover, the secondary replica takes the role of the new primary replica. In this case, data synchronization does not start automatically. It remains in the suspended state. We need to resume it manually. You might have data loss in case of forced manual failover.

In the following image, you can notice the following

  • Secondary replica for synchronous data commit is down
  • Due to some issues, the primary replica also goes down. We have now only DR replica available is for configured for asynchronous data commit
  • Due to asynchronous data commit, Last_hardended_LSN value is also different on both replicas

SQL AlwaysOn Availability Group example

Once we initiate forced failover, the secondary replica takes over the role as Primary replica. In this new primary replica, we have Last_hardended_LSN 90. Once we bring old primary replica online, both primary and secondary replica communicates with each other. Since the Last_hardended_LSN value

Once the old primary replica is brought online, it shows its synchronization as suspended is 90 on the primary replica, secondary replica (Last_hardended_LSN – 150) rolls back its transaction to LSN 90 and start the synchronization process. It caused data loss for forced manual failover.

SQL AlwaysOn Availability Group example

Conclusion

In this article, we explored scenario related to data resynchronization in SQL Server Always on Availability Groups. I will cover more materia related to SQL availability group in the upcoming articles. If you had comments or questions, feel free to leave them in the comments below

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views