July 8, 2019 at 1:23 am
We have a 3 node cluster with SQL AG setup with two nodes acting as synchronous commit mode and one node acting as DR asynchronous commit mode. The DR node is removed from possible owner list to retain the ownership of cluster on both synchronous nodes. We had an issue yesterday where suddenly inserts queries took way too long and primary replica started choking. When I logged there were not many active sessions running but the pending sessions were taking very long to finish. Single insert queries which would finish in milliseconds took around 25 minutes to complete. We immediately failed over the AG to sync replica as applications were failing to connect to sql server. Went through logs and couldn't find anything related to lease timeout or any other issue which would cause this behavior. From monitoring tool, found out that we have preemptive waits mainly preemptive_hadr_lease_mechanism. Other than DR node taken out of cluster error there's no info in logs to debug the issue. We saw a huge spike in system threads from 2k to 8k during this time. Appreciate if anyone can help me figure out why and what exactly happened on the server for this behavior.
July 9, 2019 at 2:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
July 9, 2019 at 10:31 pm
You would want to check the AlwaysOn_health extended events session. It's there for helping track down issues like this. With SQL Server 2014, You'd want to make sure to be on at least SP2 since it adds additional lease diagnostics. This link has more information on the improvements to the lease diagnostics:
Improved AlwaysOn Availability Group Lease Timeout Diagnostics
Sue
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply