April 20, 2010 at 3:19 pm
Hi,
I am monitoring the session_id of a large query which denormalises data into a large single table.
At certain points sys.dm_os_waiting_tasks returns CXPACKET for every thread.
I thought this might be a intra-query dead_lock but it eventually gets going again. Only one thread which is CXPACKET has a null for the blocking_exec_context_id.
The processors are not being maxed out but the disks are being thrashed.
What's going on?
ideas:
1) The thread that is managing the parrallism is waiting to get time on the processors (but I would expect the processors to be running at 100%)
2) Something to do with the large amount of IO work.
The instance has 8 cores and 20gb of memory
I've tried MAXDOP (8,6 and 4)
Thanks in advance.
April 20, 2010 at 8:44 pm
General advice:
Check that threads are processing a roughly equal number of rows.
Ensure useful, up-to-date statistics are available.
Avoid order-preserving exchanges where possible.
Check the type of partitioning being used.
Avoid driving parallelism from a small number of rows.
Ensure that useful indexes are available.
If you would like a more detailed analysis, post an actual execution plan in *.sqlplan format, together with output from the waiting tasks DMV.
April 21, 2010 at 12:11 am
Thanks for the advice. The question is not specifically related to this query. It's very complex and needs breaking down.
I'm just wondering how every thread can be waiting on CXPACKET without being a intra query dead lock? One of the theads must be wating on something else?
Thanks
April 21, 2010 at 1:35 am
jonwolds (4/21/2010)
I'm just wondering how every thread can be waiting on CXPACKET without being a intra query dead lock? One of the theads must be wating on something else?
CXPACKET waits are part of normal processing, and occur whenever synchronization is required.
Having all threads showing a wait status of CXPACKET is not all that unusual, but the details do depend on the query. It is also true to say that such waits are not a concern unless the wait times are long. Finally, finding all threads in this state can be just an artefact of the measurement process.
I'm sorry if that all sounds a bit hand-wavy, but without more details, I can only generalise 🙂
April 21, 2010 at 1:43 am
Hi,
CXPACKET wait type indicates that there is parallelism issue.
either query is using optiom MAXDOP or max degree of parallelism parameter is different than 1.
we had same problem. setting max degree of parallelism to 1 resolved our case.
Regards,
MShenel
April 21, 2010 at 2:19 am
Thanks Paul. I'll take an 'artifact of the measurement process' at this point. As long as this is not uncommon. I'll need to generate the same issue from a much simpler query to make examination of the execution plan a simpler job.
Thanks for your comments MShenel however I don't not think this is always wise advice. CXPACKET means that the thread is waiting for others in the session_id to finish what they are doing. This may be an IO task etc. This can be fine, so thottling your (for example) 32 core box to one core may impact on performance. The query may be fine or just need tuning a little.
April 21, 2010 at 2:32 am
shen-dest (4/21/2010)
CXPACKET wait type indicates that there is parallelism issue.
No, it indicates that a parallel query is running - see the description in the references below.
setting max degree of parallelism to 1 resolved our case.
By running everything serially, on one core. Often recommended for OLTP queries, but not any sort of good general solution.
References:
April 21, 2010 at 2:34 am
jonwolds (4/21/2010)
Thanks Paul. I'll take an 'artifact of the measurement process' at this point. As long as this is not uncommon. I'll need to generate the same issue from a much simpler query to make examination of the execution plan a simpler job.
Ok, cool. Also bear in mind what I said about the wait durations, and also that not every wait type means the system is blocked waiting for something 😉
April 28, 2010 at 7:07 pm
I'm engaged in a project looking at parallelism issues, too, and I see what you're seeing, a SPID with multiple lines all reporting current (and last) wait_type as cxpacket.
I have a theory, but it's complicated, and I don't know of any sql server internals doc that supports - or contradicts! - it.
The question to ask first is the opposite. Let's say you have an average OLTP server for these days, with 8 to 16 total cores (plus or minus hyperthreading!). The question is, how many distinct spids are in "running" state at any time? What do we even call the multiple lines per spid - tasks? I'm afraid I see some fuzziness in the way that individual tasks report status or wait_types, but I guess they can be generated in a number of different ways that is not easily seen. Anyway, I'm looking at a 16-core server, and seeing no more than three SPIDs "running" at a time. Presumably this is because one or more of them is using a bunch of processors for the parallel plan - we currently have the server setting of maxdop at 0, and no options specified at the statement level to limit maxdop either.
Soooo finally we get to the point. If you have a parallel plan, and it wants a bunch of processors for its parallel plan, what state is that in? Once upon a time, I imagined it would be "blocked", but no, that is not the case. One would think it would be "runnable", but no, that does not seem to be the case, either. What seems to be the case is that all tasks report suspended with a generic cxpacket as the wait_type.
That's my theory, my very own theory, and I'm sticking with it ... until someone disabuses me of something.
No matter how wrong it eventually turns out, HTH,
Josh
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply