November 4, 2013 at 6:42 am
We have a server that often has the following happen:
1) It breaks a SPID down and assignes hundreds of threads to the SPID (not shown in sample below)
2) All threads for the SIPD go into CXPACKET wait for an extended amount of time
So we have two issues, how to keep a complex query from eating up all workers, and how to find out what is putting these queries on hold for extended amount of time (this one was on hold for 1560 seconds until I killed the SPID.
SpidecidAge Secondsstatuswait_type
125211560suspendedCXPACKET
125221560suspendedCXPACKET
125231560suspendedCXPACKET
125241560suspendedCXPACKET
125251560suspendedCXPACKET
125261560suspendedCXPACKET
125271560suspendedCXPACKET
125281560suspendedCXPACKET
125291560suspendedCXPACKET
125301560suspendedCXPACKET
125311560suspendedCXPACKET
125321560suspendedCXPACKET
125331560suspendedCXPACKET
125341560suspendedCXPACKET
125351560suspendedCXPACKET
125361560suspendedCXPACKET
125371560suspendedCXPACKET
125381560suspendedCXPACKET
125391560suspendedCXPACKET
125401560suspendedCXPACKET
125411560suspendedCXPACKET
125421560suspendedCXPACKET
125431560suspendedCXPACKET
125441560suspendedCXPACKET
125451560suspendedCXPACKET
125461560suspendedCXPACKET
125471560suspendedCXPACKET
125481560suspendedCXPACKET
November 4, 2013 at 6:45 pm
You can set MAXDOP to half the number of cores on your server if the instance is the only instance on the server.
Fewer if more than one instance is on your server.
That will prevent processes from taking over your instance.
November 5, 2013 at 7:54 am
Currently our Cost of THreshold for Parallelism is 5 and the Max Degree of Parallelism is 0.
We have 24 cores.
Does that change your suggestion any?
November 5, 2013 at 9:08 am
You could set maxdop to 12 or 8 and see if that helps.
November 5, 2013 at 9:14 am
Mind you, this is only a band-aid.
You really should identify the query that is causing the issue and tune it.
November 7, 2013 at 6:56 am
dwilliscp (11/5/2013)
Currently our Cost of THreshold for Parallelism is 5
you also need to increase value of threshold to avoid early parallelism occurence
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 7, 2013 at 7:02 am
This morning when it failed there were only 7 SIPDs running on the server, we can subtract two... one was the base Trace and the second was database mail.
So that leaves just 5. My capturing of processes.. showing wait.. is a SQL statement. So how would I modify the following to capture the execution plan, without creating a trace to capture all execution plans? Note I have this running every 15min during the two hours that we have the most failures due to query timeout. Side note only the Jobs created parrallel processing.. 49 threads, 25 threads, and 49 threads. (Of course in this case I can get an estimated execution plan by selecting the job.)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
insert into zt_wait_Hist(
Capture_DT
, [Spid]
, ecid
, start_time
, [Age Seconds]
, nt_username
, [status]
, wait_type
, [Individual Query]
, [Parent Query]
, [program_name]
, Hostname
, nt_domain
)
SELECT
GETDATE() as Capture_DT
, er.session_Id AS [Spid]
, sp.ecid
, er.start_time
, DATEDIFF(SS,er.start_time,GETDATE()) as [Age Seconds]
, sp.nt_username
, er.status
, er.wait_type
, SUBSTRING (qt.text, (er.statement_start_offset/2) + 1,
((CASE WHEN er.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE er.statement_end_offset
END - er.statement_start_offset)/2) + 1) AS [Individual Query]
, qt.text AS [Parent Query]
, sp.program_name
, sp.Hostname
, sp.nt_domain
FROM sys.dm_exec_requests er
INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle)as qt
WHERE session_Id > 50
AND session_Id NOT IN (@@SPID)
ORDER BY session_Id, ecid
November 7, 2013 at 7:23 am
If I just take the Individual Query and run an estimated plan
Plan A: it does not show any suggested index's .. you start with a Clustered Index Scan (cost 74%), move to Sort (20%) > Parallelism (6%) > Select (0%)
This is the one that failed to execute in 20min.
No blocking showed up in the trace...
Estimated rows 28M, estimated cost 1,878, cashed plan 24b
All SPIDs of these SPIDs were executing Select statements.. the writes were going on in the SSAS cubes. (for the jobs) and the users were only doing selects.
November 7, 2013 at 8:06 am
You know ... I am begining to wonder if the problem lies in SSAS, I have passed this mornings information off to the team that handles our cubes.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply