April 5, 2023 at 5:29 pm
Rebooting secondary replicas should be non-impactful, but starting about 6 months ago we pretty consistently see a handful of databases that fail to synchronize post-reboot. Background:
Details
To fix, we manually remove the DB from the AG, drop it from the secondary, and add back to the AG. Our IT team reports no unusual network activity or underlying infrastructure anomalies when this happens, and CPU/memory on the SQL Servers is stable/normal. I haven't been able to find anything noteworthy in the Windows Event logs on the SQL servers.
Does anyone have ideas what might be causing a small subset of our databases to have these 'nonqualified transactions', or how to troubleshoot further?
April 6, 2023 at 6:10 pm
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
April 8, 2023 at 7:47 pm
We've had various issues all very similar in nature, never managed to get a root cause. Methods of fixing include -
You can probably tell we've seen our fair share of issues.
You might want to review your max worker threads, if I'm reading it right you've got up to 300 databases per instance, that's could easily cause worker thread starvation as each replicated database requires 2 or 3 threads each. If there is no activity it doesn't use the worker thread and releases back to the pool. I expect when SQL starts up it has to evaluate the replication state of each database so will require more threads.
We used to have a server with a large number of databases and we had similar issues after failover. Once we reduced the number of databases on the server the problems went away.
April 10, 2023 at 1:53 pm
Yes, we have ~300 dbs/server and indeed we do see thread exhaustion errors. We are working on an initiative to reduce the number of databases but it's a slow process.
Thanks for the reply, misery loves company 🙂
April 12, 2023 at 3:10 pm
I also started to experience this about 6 months ago, I wonder if an SP introduced the issue. I had over 300 tiny databases in my AG. I've reduced it to about 170 and I still experience it. I tried artificially bumping up the threads but still have the issue. The server is not using much CPU otherwise. I'm on SPLA licensing so adding cores would be expensive. I also have to remove the 'not synchronizing' databases from the AG and re-add them. I agree, it needs a bunch of threads at startup or failover. I have thought about splitting my 8 core Cluster into two 4 core Clusters to get more threads, my SQL licensing would be the same in that scenario. Please let me know if you find any resolution. Thanks
April 14, 2023 at 9:39 pm
300 dbs in an AG?
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
April 14, 2023 at 9:45 pm
No, 300 dbs on a server. About 50 dbs/AG.
April 14, 2023 at 10:01 pm
First question is why 50 dbs per AG, generally you want all application related dbs in a per application AG, are these sharepoint databases
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
April 14, 2023 at 10:13 pm
No they are not Sharepoint DBs. 50 DBs/AG is relatively arbitrary, that's just where we landed with our capacity testing when we first implemented AGs about 5 years ago.
Rebooting the secondary replicas used to be non-impactful (did not have issues for several years). We can't figure out why we've started having problems around 6 months ago.
April 14, 2023 at 10:17 pm
Same here, mine were fine for a few years before this issue started about 6 months ago.
April 15, 2023 at 2:13 pm
Have you checked the cluster log for more detailed event information
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
April 17, 2023 at 12:31 pm
Yes we have combed the logs pretty thoroughly and can't find anything definitive. We have a ticket open with MS so I'll post back here with whatever they report.
April 21, 2023 at 4:16 pm
Hi Steal and Rothj
Do you have seeding mode set to Automatic in your AG group?
Does this happen every time you reboot secondary?
April 21, 2023 at 6:57 pm
Yes, we are using Automatic seeding for our AGs.
Our reboots most often are due to patching, so we reboot many secondaries at the same time and yes, we always have a handful of databases that fail to re-synchronize following the reboots.
April 21, 2023 at 7:06 pm
Viewing 15 posts - 1 through 15 (of 39 total)
You must be logged in to reply to this topic. Login to reply