Prior to SQL Server 2008, there was no way to retroactively find deadlock information. Obtaining deadlock graphs required that a SQL Trace was actively running, or that Trace Flag 1222 or 1205 were turned on for the instance. Since tracing deadlocks by either of these methods can be resource intensive, this usually meant that a series of deadlocks had to occur to prompt starting a trace or enabling the trace flags. In SQL Server 2008, a new advanced troubleshooting feature called Extended Events has change all of that.
For a good overview of Extended Events, read Paul Randal's article SQL 2008: Advanced Troubleshooting with Extended Events.
SQL Server 2005 added the default trace as a sort of "black box" for a SQL Server. SQL Server 2008 expands upon this with the system_health default session in Extended Events. This was added at the request of the SQL Server Customer Support Services team and collects information that the CSS team deemed important in troubleshooting common problems in SQL Server. These include:
- The sql_text and session_id for any sessions that encounter an error with severity >=20
- the sql_text and session_id for any sessions that encounter a "memory" type of error such as 17803, 701, etc (because not all memory errors are severity >=20)
- A record of any "non-yielding" problems (you have sometimes seen these in the ERRORLOG as Msg 17883)
- Any deadlocks that are detected
- The callstack, sql_text, and session_id for any sessions who have waited on latches (or other interesting resources) for > 15 seconds
- The callstack, sql_text, and session_id for any sessions who have waited on locks for > 30 seconds
- The callstack, sql_text, and session_id for any session that have waited for an extended period of time for "external" waits or "pre-emptive waits".
For the first time in SQL Server, we can now get deadlock information after the fact without having to enable any additional tracing for troubleshooting. The system_health session uses a ring_buffer target which stores the information collected by events firing in memory as an XML document in the sys.dm_xe_session_targets DMV. This can be joined to the sys.dm_xe_sessions DMV to get the session information along with the data stored in the ring_buffer target:
select CAST(target_data as xml) as TargetData from sys.dm_xe_session_targets st join sys.dm_xe_sessions s on s.address = st.event_session_address where name = 'system_health'
Using XPath we can then get just the deadlock graphs from the XML by using a derived table and a CROSS APPLY to get the individual Event Nodes as follows:
select XEventData.XEvent.value('(data/value)[1]', 'varchar(max)') as DeadlockGraph FROM (select CAST(target_data as xml) as TargetData from sys.dm_xe_session_targets st join sys.dm_xe_sessions s on s.address = st.event_session_address where name = 'system_health') AS Data CROSS APPLY TargetData.nodes ('//RingBufferTarget/event') AS XEventData (XEvent) where XEventData.XEvent.value('@name', 'varchar(4000)') = 'xml_deadlock_report'
Since the Deadlock Graph is an XML document, we can then do a CAST( as XML). Or can we? While working on this article, I happened upon a bug in the current deadlock graph output from Extended Events. It has been filed as a bug on connect, and I have been informed that it will be fixed in a cumulative update or Service Pack for SQL Server 2008. This isn't necessarily a big bug, because the problem is a end tag mismatch for the <victim-list> XML Node. We can easily get around the bug by using the replace function to switch the incorrect end tag </deadlock> with the correct one </victim-list>.
While I am talking about bugs in Extended Events, there is another important one that has already been fixed in CU#1 that can affect whether or not data for the deadlock graph is actually collected and retained or not. If you run the above query and get a number of blank rows, then you will need to apply Cumulative Update #1 to your SQL Server 2008 instance. This bug is listed as SQL Hotfix #50003460 FIX: Extended Events session produces an empty value for data captured for certain actions and is included in the cumulative update.
To get around the invalid XML problem, perform an inline replace before casting to XML as follows:
select CAST( REPLACE( REPLACE(XEventData.XEvent.value('(data/value)[1]', 'varchar(max)'), '<victim-list>', '<deadlock><victim-list>'), '<process-list>','</victim-list><process-list>') as xml) as DeadlockGraph FROM (select CAST(target_data as xml) as TargetData from sys.dm_xe_session_targets st join sys.dm_xe_sessions s on s.address = st.event_session_address where name = 'system_health') AS Data CROSS APPLY TargetData.nodes ('//RingBufferTarget/event') AS XEventData (XEvent) where XEventData.XEvent.value('@name', 'varchar(4000)') = 'xml_deadlock_report'
From here, you can click on each XML graph and have it open expanded as a XML document in Management Studio. However, unlike the deadlock graph that is generated by SQL Trace, you can't save this XML document as a .XDL file and open it graphically in Management Studio. The reason for this is because the Extended Events Deadlock Graph is a new XML schema and format that differs from the one that is being output by SQL Trace and the Trace Flags for backwards compatibility. The new deadlock graph output will display multiple victims which was a short coming in the old format, so it is necessary to actually be able to read the deadlock graph if you plan to use the information generated by Extended Events for troubleshooting deadlocks.
For further information on how to do read the deadlock graph output, read Bart Duncan's blog posts on the subject. If you need further help, you can always post the graph output in the Administering Forum and one of the members here will be able to assist you with understanding the information contained in the graph.
Further information about the default system_health session can be found on the CSS Team blog.