Rebooting secondary replica of AG causes some databases to stop synchronizing

  • We are on SQL2019 and have NEVER had this issue but it started happening in Feb during our Windows Update/Server restarts.  Almost every month we have 1 to 3 databases that get messed up on one of the secondary nodes.  In Feb we went to CU19.

  • We have been experiencing this issue since February when we went to CU19 one to three dbs get messed up almost every month when we do Windows Updates/Server restarts.  We have 92 dbs in the 3 AGs total.  We used to have about 30 more last year so the number of DBs is significantly less.

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • I hate to possibly jinx myself, but I've had 5 failovers with no secondaries out of sync.  I've turned on ADR for a bunch of my databases.  Maybe about 60 of the largest ones (out of 185), only 60 because you have to have exclusive access to turn it on.  I'm on the latest patch but I think it is more the ADR helping.  I first did 10 and I had less secondaries out of sync so I continued enabling it for more databases.  I believe with ADR it is doing less work at startup and requiring less threads.

  • We just pushed CU22 and are still experiencing the problem.  We had 3 dbs that stopped synchronizing even after doubling CPU and RAM per Microsoft recommendations.  We are uploading logs to Microsoft for review, hopefully they'll find something useful but I'm not holding my breath.

    ROTHJ: Thanks for the input, I'll ask my team and MS if we might be a good candidate to try ADR (although I'd hope that if we were they would have recommended it by now, we've had a case open for 6 months!)

  • I'm now up to 10 failovers with no sync issues.  I'm also on CU22.

  • I see they just released CU23 but I don't see any mention of a fix for this issue.

  • Closing the loop here- After almost a year of mostly unproductive back-and-forth on a MS ticket, we finally got the straight answer we suspected (paraphrasing MS response )

    • ...by design availability groups require threads for each database so the more databases in an AG the more CPUs you need to support the databases....it seems there is not another option other than to spin up more clusters or increase processors...

    We are now working on a costly database-combining project to drive down our overall database count.

  • I stopped by just to gasp about 300 dbs/server and 50 dbs/ag. I think I legit stopped breathing for just a second.

Viewing 10 posts - 31 through 39 (of 39 total)

You must be logged in to reply to this topic. Login to reply