Query never completes with CXPACKET waits

  • Hi,
    I have a query that runs in under 40 minutes on our test server but never completes on the production server.  It sits on an insert query and just spins its wheels with CXPACKET waits.  I've read a lot about this and have been looking at other posts with this problem but I still can't figure out what's wrong.  The test server is SQL 2008R2, only has 3GB of memory configured for this instance, MAXDOP is set to 0 and Cost Threshhold for Parallelism is 5...the default settings.  It only has 1 file for TempDB and the query completes every time in under 40 minutes.  The production server is SQL 2012 and when I kicked off the query, the CPU immediately spiked to 99%.  It has 2 sockets with 4 cores per socket and 4 logical processors per socket, 8 total logical processors so I adjusted the MAXDOP to 6 and CTP to 50 and that helped keep the CPU to 50% while the query is running, but after 17 hours I went ahead and killed it.  I have another test server with 2016, same memory settings and 2 sockets with 2 cores per socket and 2 logical processors per socket, 4 total logical processors.  I left the MAXDOP and CTP to the defaults and kicked off the query and it has been running for over 22 hours with the same wait stats of CXPACKET.  It is the same exact database that is on the test server so same indexes and amount of data.  This 2016 test machine has nothing else running on it so I figured I would try and figure out what's going on while the query is still running.  I started a trace and nothing is happening. 
    -I see exclusive locks on tempdb, Blocked True, blocked by process 55 (which is the query ID)
    -session id 55 has 14.124 internal object space in tempdb and it shows the insert into a temp table for sys.dm_db_task_space_usage in tempdb
    -I see 16 rows in sys.dm_os_waiting_tasks, wait_type CXPACKET, all for session id 55
    - sys.dm_exec_requests shows 1 row, status SUSPENDED, command INSERT, wait_type CXPACKET, wait_resource NULL, open_transaction_count 2, open_resultset_count 1, cpu_time 142565589, parallel_worker_count 12
    -sys.dm_os_tasks shows 13 rows all for session id 55 of which 2 show running and the rest suspended. 

    Anyone have any ideas as to why the 2012 and 2016 servers, one with MAXDOP 6 and one with MAXDOP 0, the query never completes but this test server with MAXDOP 0 is able to run the query ok?  Is there anything else I can check on this test 2016 server while I have the query still running and spinning its wheels before I kill it and try and adjust the MAXDOP settings? 

    Any suggestions would be much appreciated. 🙂

    Thanks!
    Bea Isabelle

  • It would help to know what kind of query you are running. Setting MAXDOP 1 should significantly decrease wait time on CXPACKET.

  • Hi,
    Here is the insert query that it is stuck on
    

    Thanks!
    Bea Isabelle

  • I don't know how large those tables involved but certainly would try to limit the number of processes to 1

  • Hi,
    You mean setting the MAXDOP to 1? I'm going to eventually kill the query and test out some MAXDOP settings but it just doesn't make sense why it would run in under 40 minutes with MAXDOP set to 0 on my 2008R2 test server and on this 2016 test server with the same exact settings, same exact database, it has been running for over 23 hours? Just seems like something else is wrong here.

    Thanks!
    Bea Isabelle

  • Can you post the query? What other wait stats do you have? And what does the query plan say? Plenty of people up here like to see the query plans posted as it offers more to work with to offer better suggestions. 
    I wouldn't go down the path of immediately changing the maxdop to 1 since that may not have anything to do with it. A lot of times, all it really does it reduce you throughput so that you don't see the CXPacket waits. And just wait longer for things to complete. Sometimes it helps to move it to whatever half of your CPU count is but it really depends on your workload on the server. You could have some things really benefiting from parallelism and those could be queries that run much more often than this one
    Paul Randal has a couple of great articles on cxpacket waits that are worth investigating: 
    Knee-Jerk Wait Statistics : CXPACKET
    More on CXPACKET Waits: Skewed Parallelism

    Sue

  • Hi,
    I posted the query above.  Here is the sqlplan for the query that is still currently executing.  I will read the suggested articles and see if I can pinpoint anything that might help me understand why it won't complete.  I'm also including the sqlplan for the same query running against the same database that completed on the 2008R2 test server that has MAXDOP set to 0.

    Thanks!
    Bea Isabelle

  • Also, this 2016 test server has nothing else running on it. It is a newly built server and I restored the same database from the test server where the query completes with no problem.  I just wanted to try and run it on a server that has nothing else hitting it so that I can run profiler and queries against it to see what the problem could be.  Our 2012 production server where this same query also won't complete has many other things going on so it's harder for me to block out all the other 'noise' to troubleshoot.

    Thanks!
    Bea Isabelle

  • To be clear: I suggest using the query with OPTION(MAXDOP 1) not changing the server's settings.

  • Hi,
    This query is coming from an application that I cannot modify.  I am manually running it just to try and see what the problem is but I can't modify when it is automatically run in Production from the application.

    Thanks!
    Bea Isabelle

  • Isabelle2378 - Friday, February 10, 2017 5:45 PM

    Hi,
    This query is coming from an application that I cannot modify.  I am manually running it just to try and see what the problem is but I can't modify when it is automatically run in Production from the application.

    Check the language settings of the application. The two runs are executing different code. The first few statements of each execution are like this:

    The query which completed

    IF NOT EXISTS (SELECT COUNT(1) FROM SSystemLanguage WHERE [Language] = @Language)
    INSERT #HIndex VALUES (substring(@XKeyIndx, @pos + 1, @valuelen))

    SELECT TOP(1) @XKey = XKey from #HIndex
    DELETE FROM #HIndex WHERE XKey = @XKey

    -- Verify that key exist

    IF NOT EXISTS (SELECT TOP(1) 1 FROM TTraceStep WHERE XKey = @XKey)
    INSERT TTraceStepData



    The query which didn't finish

    INSERT

    #HIndex VALUES (substring(@XKeyIndx, @pos + 1, @valuelen))

    SELECT TOP(1) @XKey = XKey from #HIndex
    DELETE FROM #HIndex WHERE XKey = @XKey

    INSERT INTO #HCustReportTo
    SELECT DISTINCT
    TTraceStep.XKey
    FROM CExpandSteps


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Chris is right.  Look at the differences between the servers - both the environment and the actual SQL that's running.  I think you're barking up the wrong tree with MAXDOP.  CXPACKET waits mean you have parallelism, not necessarily that you have a problem with parallelism.  Changing to MAXDOP 1 just because you see CXPACKET waits would be the worst thing to do.  In any case, you'll notice that that the query that finishes uses parallelism itself.

    John

  • Alex Chamchourine - Friday, February 10, 2017 4:15 PM

     Setting MAXDOP 1 should significantly decrease wait time on CXPACKET.

    No, please don't recommend crippling the entire server because one query is paralleling badly.
    Sure, maxdop 1 will completely remove CXPacket waits, but because CXPacket isn't usually the problem (often hides the problem in fact), doing so probably just make the entire server slower.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • try update stats with fullscan.
    and if its no use  -  try this  on bd you run query

    ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON;

  • Hi,
    I agree on the comments that MAXDOP is not the solution or that CXPACKETS is the problem.  I'm trying to figure out why it's behaving this way on my production server but having no problems completing on the test server.  This is why I restored the same exact database to another test server, has the same memory, the same MAXDOP settings (default 0) and getting 2 different results.  My 2008R2 test server completes in 40 minutes (3GB memory, MAXDOP 0) and my 2016 test server never finishes.  Just hangs and the only thing I see is suspended waiting on CXPACKETS.  There is no load on this server, it is brand new and no one is using it but me for this testing.  So I'm trying to figure out what am I missing?  The indexes are all the same, I rebuilt anything that was fragmented over 30%, updated statistics on both databases.  It's the same exact query so what else can I look at on this 2016 server to try and see why it won't complete? I was hoping to get some fresh eyes to maybe point out something obvious that I'm just not seeing. Both systems are running collation Latin1_General_BIN.  I asked the analyst to check the language setting of the application to see if he see's anything different with the way the 2 queries are running the code.  Any other thoughts?

    Thanks!
    Bea Isabelle

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply