Service Broker concurrency post 2016 upgrade

  • Hi!

    We recently performed an upgrade from 2012 to 2016 (13.0.4411.0) of our production environment.

    Among the issues we're experiencing is service broker contention when sending & receiving messages with high LATCH_EX & LATCH_SH waits.

    Prior to the upgrade the wait breakdown of these processes was WRITELOG(30%), MEMORY/CPU (30%), LCK_M_U & LCK_M_X (25%) and the rest taken up by the above mentioned latches.

    Things we've tried:
    - Setting new broker to all databases with queues.
    - Changing to delayed durability (forced) on the user database worst affected by these symptoms (We are aware of the risk here).
    - Database compatibility set to 130 with legacy cardinality estimation off.

    According to Microsoft, "nothing" has been changed in service broker since 2012...

    Trying our luck, but has anyone here experienced something similar with service broker when upgrading to 2016?

  • Sorry to hear about the problems you're having with Service Broker.  I tried Service Broker for a limited use-case a few years back after being introduced to it during an exam.  I personally do not like it and have stopped using it.  I'm wondering if you have considered a different approach to solving your business problem (instead of using Service Broker).

    Do you know Denny Cherry?  I've heard him talk about Service Broker (I believe it was a conference) and he seemed very competent with it.  Maybe you should send him the question - https://www.dcac.co/

  • Skandalis - Monday, February 20, 2017 12:24 PM

    Hi!

    We recently performed an upgrade from 2012 to 2016 (13.0.4411.0) of our production environment.

    Among the issues we're experiencing is service broker contention when sending & receiving messages with high LATCH_EX & LATCH_SH waits.

    Prior to the upgrade the wait breakdown of these processes was WRITELOG(30%), MEMORY/CPU (30%), LCK_M_U & LCK_M_X (25%) and the rest taken up by the above mentioned latches.

    Things we've tried:
    - Setting new broker to all databases with queues.
    - Changing to delayed durability (forced) on the user database worst affected by these symptoms (We are aware of the risk here).
    - Database compatibility set to 130 with legacy cardinality estimation off.

    According to Microsoft, "nothing" has been changed in service broker since 2012...

    Trying our luck, but has anyone here experienced something similar with service broker when upgrading to 2016?

    I've seen it on earlier version upgrades but not 2012 to 2016. This article on SB performance is older but still relevant - I'd take a look and see if you can implement any of the suggestions in this doc:
    Service Broker: Performance and Scalability Techniques

    Sue

  • As a thought - did you have your 2012 DB in compatibility mode when you upgraded to 2016?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • bmg002 - Wednesday, February 22, 2017 12:32 PM

    As a thought - did you have your 2012 DB in compatibility mode when you upgraded to 2016?

    Hi!

    As a matter of fact, yes! What are you thinking? Based on your question I'm considering recreating our entire service broker architecture from scratch...

    Thanks for the other replies. Our queuing performed adequately & removing dependency is not an option.

  • that'd likely be it then.  Running it in compatibility mode you'd be using the old engine for your service broker.
    As you had said "According to Microsoft, "nothing" has been changed in service broker since 2012", but since you were running it in compatibility mode, you were not actully running it with the 2012 engine but an older engine.  So I wouldn't be surprised if something changed that you were not expecting.  Although this could be the wrong rabbit hole to jump down.

    That being said, it might not be the answer to your problem.  Might not hurt to check out this:
    http://www.sqlskills.com/blogs/paul/most-common-latch-classes-and-what-they-mean/

    which goes through those 2 latches you mentioned.  It also depends on how service broker was set up.  I know we use service broker here and I need to clean up our logging table every week or so.  Our process is that a stored procedure will write to an "outgoingmessage" table.  This table has a trigger which creates a message for the destination server.  the destination server has an "incomingmessage" table which has a trigger that processes the message based on the contract and message type sent.

    Are messages getting stuck in the queues?  I'm guessing you have a target and initiator queue and have activation enabled on both?
    Do you have a lot of conversations in an error state or are they all closing and cleaning up nicely?
    what is the result of this query on the database with service borker:

    SELECT COUNT(*) AS [open_conversaions]
        FROM [sys].[conversation_endpoints] ce WITH(NOLOCK)
                LEFT JOIN [sys].[services] s WITH(NOLOCK)
                ON ce.service_id = s.service_id
                LEFT JOIN [sys].[service_contracts] sc WITH(NOLOCK)
                ON ce.service_contract_id = sc.service_contract_id

    And this one:
    SELECT COUNT(*) AS [num_bad_conversaions]
        FROM [sys].[conversation_endpoints] ce WITH(NOLOCK)
                LEFT JOIN [sys].[services] s WITH(NOLOCK)
                ON ce.service_id = s.service_id
                LEFT JOIN [sys].[service_contracts] sc WITH(NOLOCK)
                ON ce.service_contract_id = sc.service_contract_id
                WHERE state LIKE 'DI' OR state LIKE 'ER'

    The first of these queries will show you the number of open conversations.  The second shows the number that are in the error or disconnect inbound state and should be safe to close.  
    If you have too many open conversations (error, disconnected inbound, or connected and never closed) I've noticed that service broker will slow down drastically.  

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks to all involved who had a look at this.

    For closure: We tracked the issue down to query store consuming more resources than what we would have liked. Service broker was just a victim of the issue.

  • Putting this here since this was the thread I came upon while trying to find a problem.

    We migrated from SQL Server 2012 to SQL Server 2017, and that server's primary job involves service broker.  We immediately had performance issues with our Service Broker setup, including an insane amount of locking (LCK_M_IX). We finally tracked it down to the way the inserts were occurring, where the plan generated worked for a handful of inserts if the table was empty (it had an additional sort before the Index Insert) and were able to fix it by manually updating the stats and adding rowcount/rowsize of 100k to the empty tables. (We use hourly tables, so we couldn't just force the plan).  We initially turned off Query Store, but that turned out not to be our problem, and wound up being instrumental in figuring out our problem. I have other loads where I can't turn on Query Store, but it's not always the perf issue.  Hope this helps someone.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply