Yesterday i shared the first article in a three part series about the SQL Black Box Recorder, that you can read here, with a promise of at least two more posts. The first article in the series laid the groundwork about the black box recorder in SQL Server. Furthermore, there was a brief introduction into the three components of the black box recorder. The three components as laid out in that article are:
- Default Trace
- system_health Extended Event Session
- sp_server_diagnostics procedure
The first article went into deeper detail about the first leg of this three-legged black box recorder – the default trace. In this article, the focus will be directed to the next leg of the black box recorder – or the system_health Extended Event Session. If by some measure you are in the dark about what Extended Events is, then I recommend you read my personal blog series on the subject. There are numerous articles stepping through the subject in easy-to-digest fashion. You can find many of the articles in the following table of contents – here.
The focus of this article will not be to introduce the topic of Extended Events. Rather, it will be to introduce the system_health session and dive into it a fair bit.
SQL Black Box Recorder
What is the system_health session?
Beyond being a component of the black box for SQL Server, what exactly is this event session? The system_health is much as the name implies – it is a “trace” that attempts to gather information about various events that may affect the overall health of the SQL Server instance.
The event session will trap various events related to deadlocks, waits, clr, memory, schedulers, and reported errors. To get a better grasp of this, let’s take a look at the event session makeup based on the available metadata in the dmvs and catalog views.
DECLARE @SessionName VARCHAR(256) = 'system_health'; /* note: this only works if the session is running and the source is "server" not internal or something else. */SELECT xs.name AS SessionName , xs.session_source , xse.event_name , xo.description AS EventDesc , xsea.action_name , xoa.description AS ActionDesc , xse.event_predicate , sese.predicate, sese.predicate_xml FROM sys.dm_xe_sessions xs INNER JOIN sys.dm_xe_session_events xse ON xs.address = xse.event_session_address LEFT OUTER JOIN sys.dm_xe_session_event_actions xsea ON xse.event_session_address = xsea.event_session_address AND xse.event_name = xsea.event_name INNER JOIN sys.dm_xe_objects xo ON xse.event_name = xo.name AND xse.event_package_guid = xo.package_guid AND xo.object_type = 'event' LEFT OUTER JOIN sys.dm_xe_objects xoa ON xsea.action_name = xoa.name AND xsea.action_package_guid = xoa.package_guid AND xoa.object_type = 'action' INNER JOIN sys.server_event_sessions ses ON xs.name = ses.name LEFT OUTER JOIN sys.server_event_session_events sese ON ses.event_session_id = sese.event_session_id AND xse.event_name = sese.name WHERE xs.name = @SessionName ORDER BY xs.name,xse.event_name, xsea.action_name;
In addition to the types of events I mentioned, there are also a few more interesting things to discover from the results of this particular query. For instance, the actions that are being employed help to gather even more information such as the callstack, tsql_callstack, database id, and sql text of the query that was executing at the moment of the event that is trapped.
One very peculiar event that is trapped is the sp_server_diagnostics_component_result event. Remember from earlier in this article that I noted the third leg of the black box record happens to be a very similarly named session – “sp_server_diagnostics procedure”. When I cover this third leg, it will make a little more sense. Understand, for now, that your suspicion of it being a bit more integral to the black box (than merely 1/3 of the overall black box) is confirmed.
This is a good set of information that can be trapped. Furthermore, this is a good set of information which can give a fairly good diagnosis of several different potential problems within your SQL Server Instance! (Yes it may seem repetitive, but I wanted to underscore the importance of the data that can be trapped.) That said, compare the types of events to those of the default trace. Take note of how many events actually overlap between the two components of the black box recorder.
For ease of comparison, let’s simplify that previous query to get a set of events to read without the duplication due to the actions on each event.
DECLARE @SessionName VARCHAR(256) = 'system_health'; SELECT xs.name AS SessionName, xse.event_name FROM sys.dm_xe_sessions xs INNER JOIN sys.dm_xe_session_events xse ON xs.address = xse.event_session_address WHERE xs.name = @SessionName ORDER BY xse.event_name;
And since I am feeling particularly generous, I have lumped the events from the two traces together in the following query:
DECLARE @SessionName VARCHAR(256) = 'system_health'; SELECT xs.name AS SessionName , xse.event_name AS EventName , mv.map_value AS CategoryName FROM sys.dm_xe_sessions xs INNER JOIN sys.dm_xe_session_events xse ON xs.address = xse.event_session_address INNER JOIN sys.dm_xe_objects xo ON xse.event_package_guid = xo.package_guid AND xse.event_name = xo.name AND xo.object_type = 'event' INNER JOIN sys.dm_xe_object_columns oc ON oc.object_name = xo.name AND oc.object_package_guid = xo.package_guid AND oc.name = 'KEYWORD' INNER JOIN sys.dm_xe_map_values mv ON oc.type_name = mv.name AND oc.column_value = mv.map_key AND oc.object_package_guid = mv.object_package_guid WHERE xs.name = @SessionName UNION ALL /* events */SELECT 'Default Trace' AS SessionName , te.name AS EventName , tca.name AS CategoryName FROM sys.traces t CROSS APPLY ( SELECT DISTINCT gei.eventid FROM sys.fn_trace_geteventinfo(t.id) gei ) ca INNER JOIN sys.trace_events te ON te.trace_event_id = ca.eventid INNER JOIN sys.trace_categories tca ON te.category_id = tca.category_id WHERE t.is_default = 1 ORDER BY EventName;
That should help to compare the 17 events from the system_health session and the 34 from the default trace with a little more ease and speed. The events from the two sessions are rather complimentary to each as they help to construct the SQL Server Black Box recorder.
How to use this session?
With the events trapped by this session, there are a few ways to use the trapped data that should stand out pretty easily. To me, the quick application of the data from this session can come from the review of deadlocks, reported errors, or even waits analysis. Using the deadlock as probably the most common problem people will want to troubleshoot, this information is suddenly much more valuable and easy to access (no more need for that trace flag). Since the system_health is now trapping the deadlock by default, one could quickly access the deadlock graph from the event data and proceed to troubleshooting. More information on getting that deadlock graph can be found here and here.
Learn to use the data made available through this member of the black box recorder. Becoming familiar with this tool (and additionally with Extended Events) will make you a rock-star in the office. The information needed to troubleshoot most problems is right at your fingertips through the use of the different pieces of the black box recorder. If it is not available there, then more than likely it can be made available through a bit of tinkering with Extended Events. Get to know the tools and be that rock-star DBA.
For more uses of Extended Events, I recommend my series of articles designed to help you learn XE little by little.
Interested in seeing the power of XE over Profiler? Check this one out!
This has been the fifth article in the 2018 “12 Days of Christmas” series. For a full listing of the articles, visit this page.