Slow Network speed causing Transaction slowness in Asynchronous Commit mode

  • We have configured Always ON availability group with Asynchronous commit mode on a 64 CPU and 1024 Gb memory , SQL Server 2022 Enterprise edition Cu 14 servers . In Asynchronous commit mode transactions should commit on the Primary before getting acknowledgement from secondary replica, that means transactions should commit before it is hitting the network. But What I am noticing is that when the network speed is less between the Primary replica and Secondary Replica , the transaction execution time is increasing. I am running a Delete batch , and when the Network speed is normal it is taking 3-4 minutes , but the same batch delete is taking 10-12 minutes when the Network Speed is less between the Primary and Secondary replica.

    Any idea why Network bandwidth is impacting huge difference in transaction execution time on an Asynchronous Commit mode availability group?

    • This topic was modified 2 months, 2 weeks ago by  eldosepd.
    • This topic was modified 2 months, 2 weeks ago by  eldosepd.
  • As a guess - I would say you have no explicit transactions, right? If so, then the commit will happen as soon as the command completes. What I mean is if your query has 10 delete statements (for example), that is 10 implicit commits that occur - one per delete. Thus, things are slower. I suspect that if you put an explicit transaction around your deletes, you will get the performance you are expecting.

    BUT I am just guessing. What you provided didn't tell me much about what you were running.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • We need o do it as implicit transactions to avoid any blockings on the database, we are deleting millions of records everyday in a purge operation which cannot be done as an explicit transaction.  My main concern is why it is taking more time when the DB is in Asynchronous commit mode. If i do the delete in a Db which is not in always on it is taking 2-3 minutes, when I am adding the Db to Asynchronous commit Always on in same server  with a slower network secondary replica  it is taking 10-12 minutes.

  • My GUESS is since you have implicit transactions turned on, the delete is happening in batches and the "sync" is only happening AFTER the transaction completes. So if you are deleting millions of records, the sync only occurs after the delete completes and is syncing across all the data.

    Plus, implicit transactions will block if they are enabled. Implicit transactions mean that when your query starts running, "BEGIN TRANSACTION" is implied in the query even if you don't explicitly state it. If you don't commit or rollback the transaction, your query will have an open transaction when it completes. Are you using autocommit transactions perhaps? In that mode, each statement is a transaction and auto commits at the end of the statement.

    As a thought, you say it is fast when HA is off but slow when HA is on, any chance anything else is different such as users accessing the table or number of indexes or even different servers (and thus different resource loads)? I am just wondering if MAYBE the slowness isn't HA but is due to system use.

    If it is all the same, I'd try grabbing the statistics IO and statistics TIME as well as actual execution plans. May not hurt to check if MAYBE something is different between the two queries.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • No users connected to it as the servers are in project phase and no load is happening on the server during the tests. execution plan is same for both cases.

     

    • This reply was modified 2 months, 1 week ago by  eldosepd.
  • If the execution plans are the same, then that is weird that performance is different with the AO on and off, especially with async enabled as AO should just push things to the log and after the log is ready the secondary pulls that to make things sync up. Once the transaction is complete, it should push it to the log and then be done from the primary side of things. AO shouldn't impact your query performance.

    I do have a thought on the performance slowing down when the network is slow though - any chance you have your SQL data and log on NAS or SAN? If so, that may be using network bandwidth. So slow network could mean slow disk which would mean slow queries.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I did the same test with a Standalone DB which is in the same SAN when the network is slow, but that is completing fast. So I am not thinking it is a SAN issue\SAN network issue.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply