SQL Server patching in Windows Cluster Failover + Availability Groups

Question

Post reply

SQL Server patching in Windows Cluster Failover + Availability Groups

domenm

Old Hand

Points: 367
More actions
September 10, 2021 at 10:30 am

#3928503
Hello,
we have SQL Server 2016 set up in an interesting configuration - we have Failover cluster (2 nodes - SERVER1, SERVER2) and in the same time, the databases are included in the Always On Availability Groups (SERVER1/2, SERVER3).
The patching process (OS + SQL CU) for the WSFC alone is quite straightforward, so failover SQL instances to active node, patch and restart passive node, failover SQL to the patched node and then patch the remaining node.
Also, patching the nodes in the Availability groups is not troublesome (patch standby server, failover, patch the other one).
But what about patching the nodes in the configuration we have? Are there any recommendations for that?
What I would do is:
1. patch standby replica in AG (SERVER3)
2. failover AG (to SERVER1)
3. failover SQL roles to active node in WSFC (to SERVER1)
4. patch and reboot passive WSFC node (SERVER2)
5. failover SQL roles to the patched WSFC node (SERVER2)
6. repeat patching for new passive node (SERVER1)
7. failover AG back to original node (SERVER3)
Would this be a correct procedure, or am I missing something?
Thanks for your opinions!
Regards,
Domen
- This topic was modified 3 years, 3 months ago by domenm.
Attachments:
You must be logged in to view attached files.
September 30, 2021 at 4:31 am

This was removed by the editor as SPAM

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Ant-Green SSC Guru Points: 113509 More actions · Answer 1

Patch then async manual failover replica

Patch whatever is then the secondary in the WSFC

Failover to secondary

Patch the remaining node.

If you want to failback afterwards do so, but not important

Jeffrey Williams SSC Guru Points: 90313 More actions · Answer 2

What is the purpose of the AG node? I am assuming that the FCI cluster is primary - SERVER3 is an AG secondary, and that secondary is set to asynch and read-only (probably setup and configured for reporting).

If the secondary is actually setup in another data center and utilized for DR, is it set to synchronous or asynchronous and what is the failover mode (should be manual).

In this type of configuration, you should not have automatic failover configured from the FCI to the AG (or from the AG back to the FCI). For patching, the idea is to create as little as possible interruption to the application/users - for that you would patch the passive FCI node and restart, patch the secondary and restart - then failover from the non-passive FCI node to the now updated passive FCI node, patch the now passive node and restart.

At that point, stop - you don't need to fail back to the other node. You can run on that node until the next patch cycle - at which point you will then be back on the first node of the FCI cluster.

But this all depends on how you have this setup and configured - so make sure you understand the full configuration and how each node in the cluster is utilized.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

domenm Old Hand Points: 367 More actions · Answer 3

Hi,

thanks for your replies!

@Jefrey The purpose of the secondary node (AG) is the DR, it resides in another data center and yes, it is async replication mode with manual failover. Also some of the databases that are part of AG here are also meant for read-only purposes, so that way we take some workload from the primary replica.

So, if I understand correctly your suggestion would be:

1. patch and restart passive WSFC node (SERVER2)

2. patch and restart the standby AG replica (SERVER3)

3, failover to the patched WSFC node (SERVER2)

4. patch and restart the new passive node (SERVER1) and leave it like this

Thanks!

Best regards,

Domen

Jeffrey Williams SSC Guru Points: 90313 More actions · Answer 4

The steps you outlined are correct - that process will allow for patching with minimal interruption.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

chuck.hamilton SSCommitted Points: 1539 More actions · Answer 5

I know this is an old post but wouldn't you want to patch the async secondary availability replica first like Ant-Green suggested? That primary replica regardless of which node it's on would then be sending log data from a lower version to a higher version which should be OK . If you patched the primary instance first you would end up sending log data from higher version to lower version which may not work.

I guess another option would be to suspend data movement for all databases until patching completed on all servers.

Jeffrey Williams SSC Guru Points: 90313 More actions · Answer 6

chuck.hamilton wrote:

I know this is an old post but wouldn't you want to patch the async secondary availability replica first like Ant-Green suggested? That primary replica regardless of which node it's on would then be sending log data from a lower version to a higher version which should be OK . If you patched the primary instance first you would end up sending log data from higher version to lower version which may not work.
I guess another option would be to suspend data movement for all databases until patching completed on all servers.

Both secondary nodes are patched prior to any failover. Assuming the primary is NODE1, the FCI secondary is NODE2 and the AG secondary is NODE3. You patch and restart the secondary nodes - both NODE2 and NODE3 are now up to date with OS and SQL updates.

You then failover the FCI to NODE2 and patch/restart NODE1.

At the time NODE3 comes up and synchronizes, up until the point you failover - you have NODE1 (lower version) sending data to NODE3 (upgraded). As soon as you failover, you now have NODE2 (upgraded) sending to NODE3 (upgraded).

This follows the standard rolling upgrade process - regardless of FCI vs AG cluster and regardless of CU vs Edition upgrade. In fact, I would perform this same type of rolling upgrade to upgrade from SQL Server 2017 to SQL Server 2019 by simply adding new nodes into the cluster, installing SQL Server 2019 on the new servers and then failover from the 2017 node to the 2019 node. After the failover - remove the 2017 node(s) from SQL Server and evict the node(s) from the cluster.

Note: only a single edition up is supported for a rolling cluster upgrade. That is - only a single edition of the OS and SQL Server is supported. So you can go from Windows Server 2016 to Windows Server 2019 and SQL Server 2017 to SQL Server 2019 - but you could not upgrade from Windows Server 2016 to Windows Server 2022 or from SQL Server 2017 to SQL Server 2022.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

SQL Server patching in Windows Cluster Failover + Availability Groups

Attachments: