Stuck tracking down CXPACKET issue

Question

Post reply

Stuck tracking down CXPACKET issue

dwilliscp

SSCrazy Eights

Points: 9270
More actions
November 4, 2013 at 6:42 am

#299840

We have a server that often has the following happen:
1) It breaks a SPID down and assignes hundreds of threads to the SPID (not shown in sample below)
2) All threads for the SIPD go into CXPACKET wait for an extended amount of time
So we have two issues, how to keep a complex query from eating up all workers, and how to find out what is putting these queries on hold for extended amount of time (this one was on hold for 1560 seconds until I killed the SPID.
SpidecidAge Secondsstatuswait_type
125211560suspendedCXPACKET
125221560suspendedCXPACKET
125231560suspendedCXPACKET
125241560suspendedCXPACKET
125251560suspendedCXPACKET
125261560suspendedCXPACKET
125271560suspendedCXPACKET
125281560suspendedCXPACKET
125291560suspendedCXPACKET
125301560suspendedCXPACKET
125311560suspendedCXPACKET
125321560suspendedCXPACKET
125331560suspendedCXPACKET
125341560suspendedCXPACKET
125351560suspendedCXPACKET
125361560suspendedCXPACKET
125371560suspendedCXPACKET
125381560suspendedCXPACKET
125391560suspendedCXPACKET
125401560suspendedCXPACKET
125411560suspendedCXPACKET
125421560suspendedCXPACKET
125431560suspendedCXPACKET
125441560suspendedCXPACKET
125451560suspendedCXPACKET
125461560suspendedCXPACKET
125471560suspendedCXPACKET
125481560suspendedCXPACKET

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

arnipetursson SSCertifiable Points: 6557 More actions · Answer 1

You can set MAXDOP to half the number of cores on your server if the instance is the only instance on the server.

Fewer if more than one instance is on your server.

That will prevent processes from taking over your instance.

dwilliscp SSCrazy Eights Points: 9270 More actions · Answer 2

Currently our Cost of THreshold for Parallelism is 5 and the Max Degree of Parallelism is 0.

We have 24 cores.

Does that change your suggestion any?

arnipetursson SSCertifiable Points: 6557 More actions · Answer 3

You could set maxdop to 12 or 8 and see if that helps.

arnipetursson SSCertifiable Points: 6557 More actions · Answer 4

Mind you, this is only a band-aid.

You really should identify the query that is causing the issue and tune it.

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 5

dwilliscp (11/5/2013)
Currently our Cost of THreshold for Parallelism is 5

you also need to increase value of threshold to avoid early parallelism occurence

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

dwilliscp SSCrazy Eights Points: 9270 More actions · Answer 6

This morning when it failed there were only 7 SIPDs running on the server, we can subtract two... one was the base Trace and the second was database mail.

So that leaves just 5. My capturing of processes.. showing wait.. is a SQL statement. So how would I modify the following to capture the execution plan, without creating a trace to capture all execution plans? Note I have this running every 15min during the two hours that we have the most failures due to query timeout. Side note only the Jobs created parrallel processing.. 49 threads, 25 threads, and 49 threads. (Of course in this case I can get an estimated execution plan by selecting the job.)

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

insert into zt_wait_Hist(

Capture_DT

, [Spid]

, ecid

, start_time

, [Age Seconds]

, nt_username

, [status]

, wait_type

, [Individual Query]

, [Parent Query]

, [program_name]

, Hostname

, nt_domain

)

SELECT

GETDATE() as Capture_DT

, er.session_Id AS [Spid]

, sp.ecid

, er.start_time

, DATEDIFF(SS,er.start_time,GETDATE()) as [Age Seconds]

, sp.nt_username

, er.status

, er.wait_type

, SUBSTRING (qt.text, (er.statement_start_offset/2) + 1,

((CASE WHEN er.statement_end_offset = -1

THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2

ELSE er.statement_end_offset

END - er.statement_start_offset)/2) + 1) AS [Individual Query]

, qt.text AS [Parent Query]

, sp.program_name

, sp.Hostname

, sp.nt_domain

FROM sys.dm_exec_requests er

INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid

CROSS APPLY sys.dm_exec_sql_text(er.sql_handle)as qt

WHERE session_Id > 50

AND session_Id NOT IN (@@SPID)

ORDER BY session_Id, ecid

dwilliscp SSCrazy Eights Points: 9270 More actions · Answer 7

If I just take the Individual Query and run an estimated plan

Plan A: it does not show any suggested index's .. you start with a Clustered Index Scan (cost 74%), move to Sort (20%) > Parallelism (6%) > Select (0%)

This is the one that failed to execute in 20min.

No blocking showed up in the trace...

Estimated rows 28M, estimated cost 1,878, cashed plan 24b

All SPIDs of these SPIDs were executing Select statements.. the writes were going on in the SSAS cubes. (for the jobs) and the users were only doing selects.

dwilliscp SSCrazy Eights Points: 9270 More actions · Answer 8

You know ... I am begining to wonder if the problem lies in SSAS, I have passed this mornings information off to the team that handles our cubes.