Top Wait Types - CXPACKET and SOS_SCHEDULER_YIELD, but Low CPU?

  • Why would I see my top wait types of CXPACKET and SOS_SCHEDULER_YIELD, but see low CPU utilixation (20-40%)?

    Thanks, Dave

  • Lots of parallelism skew. Lower MaxDop

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • We get CXPACKET waits because we have some Stored Procedure queries that run in parallelism. Some of these cause so much blocking to itself, that we change the Query in the SP to use MAXDOP=1. We also have 16 CPUs but have Max Degree of Parallelism on the instance to 8 so all processors can not be busy with one query.

    Others, we let run because they run fast with parallelism than not to run in parallel. Also, the Estimated Cost of some queries can be reduced by tuning the query or using new indexes. SSMS for 2008 does a good job of displaying suggested indexes when you look at the cost.

    There is a Cost of Threshold on an instance and it is set at 5 by default. We have changed one instance Cost of Threshold to 50, thus reducing the possibility of some qureies to run in parallel.

    It is usually in production when we find these things, because our development and test environment aren't as powerful as production and we do not have the resources to test loads like production.

    God Bless,

    Thomas

    Thomas LeBlanc, MVP Data Platform Consultant

  • Thanks Gail and Thomas,

    Gail, I found out this week my understanding of parallelism was incorrect and am still trying to fully comprehend its meaning. I had thought parallelism refers to a process utilizing more then one processor to run a query statement. Not the case. If I now understand it corretly it means that SQL Server is attempting to run parts of a SQL statement at the same time, which corresponds to CPU threads. Is this correct? If I am understanding it then when you say 'parallelism skew' are are you referring to the number of processor sub-threads that are spawned to address a request? If so, how do I know when a sub-thread bottleneck is hit? Can this be seen through a DMV or Profiler? As usual, I appreciate your help.

    Thomas, I just read the BOL explanation of the Cost Threshold for Parallelsim and I may want to give this a try. Asssuming I understand it correctly if the default is 5 seconds and I set the option to 50 that means SQL Server will use parallelism if its estimated run time is 50 seconds or greater. Am I correct? If I am correct then it is possible that a process which runs in less then 50 seconds may actually run faster using parallelism, but we've told SQL Server in the above example to only consider processes with an estimated run time of 50 seconds or greater.

    Thanks, Dave

  • No, Cost Threshold is not in seconds. Read about Execution Plans, and you will see a 'Cost'. If the 'Cost' of the query is greater than the Threshold, SQL Server will see if Parallelism would help the query. It does not guarentee to run parallel. But, if you see CXPACKET as a Wait Stat, then you diffenitely have queries that are running parallel.

    From your comment, it is apparent you have not spent time learning about Execution Plan and their Cost. I would highly suggest reading and learning before changing a Production setting.

    Most server/instance settings by default are OK. It is only after lots of research and testing do we make changes to system level properties.

    Here is a starting point for you: http://www.simple-talk.com/sql/performance/execution-plan-basics/

    God Bless,

    Thomas

    Thomas LeBlanc, MVP Data Platform Consultant

  • Thanks Thomas. I'll check out the link, but my comments are based upon two BOL articles. Here is a quote from one.

    Use the cost threshold for parallelism option to specify the threshold at which SQL Server creates and executes parallel plans for queries. The cost refers to an estimated elapsed time in seconds required to execute the serial plan on a specific hardware configuration. Only set cost threshold for parallelism on symmetric multiprocessors (SMP).

    If it is not in seconds than what does the above statement indicate?

    Thanks, Dave

  • DBADave (2/13/2009)


    If I now understand it corretly it means that SQL Server is attempting to run parts of a SQL statement at the same time, which corresponds to CPU threads. Is this correct?

    No. Parallelism occurs when SQL runs certain operators of the query in parallel. So, if it decides that an index scan can be paralleled across 4 processors, then each of the four are scanning a portion of that index

    If I am understanding it then when you say 'parallelism skew' are are you referring to the number of processor sub-threads that are spawned to address a request?

    No.

    Parallelism skew is when the division of work across the involved processors is not equal. So one processor (for whatever reason) has more work to do than the others. Since all have to complete before the query can continue, the ones that finished earlier have to wait for the slower one. That wait is a CXPacket wait

    If so, how do I know when a sub-thread bottleneck is hit? Can this be seen through a DMV or Profiler?

    No. It has nothing to do with subthreads (which is a bad term since a thread is an atomic entity and cannot be broken down further)

    If you're seeing excessive CXpacket waits, then you have a lot of parallelism skew. Usual recommendation, reduce the system's MAX DOP. What you reduce it to depends on the no of processors and what type of work the system i doing. Data warehouse will usually use higher MAXDOP than an OLTP system

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Most articles I have read and talks with SQL Server professional say Cost is not relative to anything one item that goes into the execution of a query. I can tell you from experience, we have queries that have a cost of >100, and run in 5 or less seconds, even after the cache is flushed.

    All tuning that we have done at the companies I worked for have been through alot of trial and error. MAXDOP is good for some, not so good for others.

    If you look at a graphical execution plan, you will see alot of Esimates, and only a couple of Actuals.

    Reads, writes (logical ad physical) are other items to look at while tuning.

    We have had problems with parallelism, and believe me, they are obvious when it becomes a problem.

    Your top wait stat might be just that, 'The Top Wait Stat', it might not be a problem, just the top wait state on your system.

    God Bless,

    Thomas

    Thomas LeBlanc, MVP Data Platform Consultant

  • Looks like I am 0 for 3. Not good. I did some reading so let's see if I have a better understanding.

    No. Parallelism occurs when SQL runs certain operators of the query in parallel. So, if it decides that an index scan can be paralleled across 4 processors, then each of the four are scanning a portion of that index

    Parallelism occurs when SQL runs certain operators in parallel such as nested loop joins, index scans and seeks, RID lookups, sorts, table scans, etc.

    Parallelism skew is when the division of work across the involved processors is not equal. So one processor (for whatever reason) has more work to do than the others. Since all have to complete before the query can continue, the ones that finished earlier have to wait for the slower one. That wait is a CXPacket wait

    CXPackets occur when a query has its operations run in parallel, but not all operations complete at the same time. SQL Server cannot continue to the next SQL statement because not all operations have completed. This results in waiting defined as CXPacket. A CXPacket does not necessarily indicate a CPU bottleneck. It can also occur due to disk or other resource bottlenecks. For example, if a query has its operations run in parallel and part of the operation cannot complete quick enough due to an IO bottleneck, a CXPacket can be reported. Correct???

    We also noticed a situation yesterday where we had a lot of CXPackets, but the CPU and disks were fine. However, there were a high number of locks. We assume it is possible for CXPacket to result from SQL Server not being able to keep up with lock management. Does this sound plausible?

    Lastly, I hope, is my understanding of threads. BOL says:

    Computers that have multiple microprocessors or CPUs can execute one thread per CPU at the same time. For example, if a computer has eight CPUs, it can execute eight threads at the same time.

    I'm trying not to get to wrapped up in the BOL definition. Should I look at a thread as simply an execution path??? So when I see 20-30 ECIDs associated with a process id that means I have 20-30 execution paths???

    How am I doing teach 🙂

  • DBADave (2/14/2009)


    CXPackets occur when a query has its operations run in parallel, but not all operations complete at the same time.

    Not all threads processing a single query operator (like a join) finish at the same time.

    SQL Server cannot continue to the next SQL statement because not all operations have completed.

    SQL cannot continue with the next operator in the current query

    For example, if a query has its operations run in parallel and part of the operation cannot complete quick enough due to an IO bottleneck, a CXPacket can be reported. Correct???

    The thread that's incuring the IO wait will show an IO wait, the rest of the threads processing that operator will show a CXPacket

    We assume it is possible for CXPacket to result from SQL Server not being able to keep up with lock management. Does this sound plausible?

    No. More likely one or more of the threads was waiting on a lock and the rest were waiting for them to finish.

    If you have lots of locking, try and resolve that first and the CX packets may g away by themselves.

    I'm trying not to get to wrapped up in the BOL definition. Should I look at a thread as simply an execution path???

    No. BoL is perfectly correct. A thread is what executes on a CPY. A connection may have multiple threads

    So when I see 20-30 ECIDs associated with a process id that means I have 20-30 execution paths???

    You have 20 or 30 threads in the same process.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply