May 5, 2021 at 3:21 pm
I apologize in advance for this being vague, but it would take some time to include real code - there is a TON of it and some things would need to be masked out. So I am going to post this as a very general "throw out some ideas" kind of question.
We have a nightly process in our Great Plains system that basically takes a snapshot of the current historic aged trial balance and dumps the results into a table. It usually runs in under a minute, but recently we've had some failures due to the job hanging and running for hours - we typically end up killing it after 24 hours.
Here is the weird part... If I run the exact same code again, it often finishes again in under a minute. Which, okay... it could be cached or whatever; but if we tweak the date parameters, we can get it to alternate between running normally (under a minute) to hanging - we generally kill it after 10 minutes.
Every time it hangs, it's hanging on an update statement against temp tables created by the stored proc, with a wait type of CXCONSUMER.
Our first guess was some kind of bad data... but that doesn't seem to match because it should always fail. Our next guess was some other process clashing with it... but the thing is, I can replicate the problem on our reporting server, which is an hourly log-shipped copy of production. There are no processes on the reporting server, and the databases are read-only.
My guess - based on the CXCONSUMER - is that it's resources. But I can't pinpoint anything that would be causing the issue on two different boxes. Both servers are pretty robust in terms of CPU, RAM, etc.
The stored procedure has parameters to allow us to provide a range of customer names - the nightly job leaves these blank, which means do all. We can run "D" through "Z" without an issue. If we try "0" (zero) through "D" it hangs. But we can do "0" - "B", "C"-"D" and it works.
Again, I realize this is all very vague... sorry about that...
May 5, 2021 at 3:26 pm
It sounds like you know what you're doing, so I suspect you'll have considered this already, but could it be down to parameter sniffing?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
May 5, 2021 at 5:01 pm
Another thought (as I ran into a similar issue when working with Great Pains... I mean Great Plains) - try setting your MAXDOP on that query to 1. We had a similar thing where the query would run in under a minute in MOST cases, but every now and then would run for 7 hours with CXCONSUMER wait types.
We added a query hint of MAXDOP 1 and it fixed the issue for us.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
May 5, 2021 at 5:15 pm
CXCONSUMER is waiting on parallelism.
Have you adjusted the SQL Server configuration for 'cost threshold for parallelism' to be something larger than the default? The default is 5, IIRC, which is wayyy too low, because it makes SQL too aggressively use parallel when it's not really needed and just causes unnecessary overhead. Most people use a setting around 50, although sometimes I've seen as high as 300 used on some systems.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 5, 2021 at 6:46 pm
Appreciate the responses 🙂
I had discounted parameter sniffing (perhaps unwisely) because the same job has been running for over ten years on five different GP databases without this issue. One of them is much larger, and has a greater variety of data, so it seemed like it should be the one affected instead of this one.
I did think of MAXDOP but was unconsciously avoiding that one because the process in question covers multiple stored procedures that generate dynamic SQL against temp tables - have I mentioned I hate this process?? - however...
The "cost threshold for parallelism" seems to have done the trick. At least, in our test environment I've run the thing several times after bumping that from 5 to 50 and it's run faster than it usually does. So that could very well be the fix... I kind of wish I knew why it became an issue all of a sudden, but I'll do some further testing and see how it goes.
Thanks, guys... especially given the somewhat lacking information I provided.
May 10, 2021 at 12:53 am
Just to follow up, that setting is a life saver.
The process I was having issues with is now running at around 40 seconds consistently, it used to be around a minute. We haven't had it hang since the change.
May 10, 2021 at 2:10 pm
Yeah. I'm not sure why MS hasn't changed the SQL Server default value for that setting. They've made other adjustments to critical starting values, particularly for tempdb, but somehow haven't adjusted that one yet.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply