Blog Post

Extended Events Intro

,

Extended Events is supposed to be taking over for most of Profiler and server-side tracing functionality, but there were people like me out there that took their time making the switch. For me the excuse was SQL Batch Completed isn’t in 2008 / 2008 R2, most databases are slow to move off of 2008 because it’s hard to say 2008 can’t do everything you want, and, well, I really like SQL Batch Completed!!!

Now I’m losing my excuses. It’s been three years since 2012 came out and all the new servers I’m installing are 2012 or 2014, so I lost that excuse. Then I came up with “I don’t know how to use them, but I know how to use something that does almost the same thing”; that’s great logic during an outage, but terrible logic for the long term. So, I’m learning it, still learning it, and I have something to help out those who made it to 2012 but are still stuck on the “I don’t know how to use them” phase of denial.

In several of my blog posts I referenced my favorite server-side trace. Capture everything on the server that takes over X seconds, with 5 being my favorite starting place. Yes, this adds about 1ms to anything that takes over 5 seconds, which adds up to…insignificant. Seriously, if this kills your server then it was dead already, but you will see arguments against creating this for long-term runs.  I don’t agree with those arguments, but they’re your servers and I want you to make an informed decision on what you run.

Anyways, here’s how I set it up with Extended Events, for which I used Jonathan Kehayias’s (b|t) Trace to XE Converter to get started.

IF EXISTS (SELECT 1 FROM sys.server_event_sessions WHERE name = 'DurationOver5Sec')
DROP EVENT SESSION [DurationOver5Sec] ON SERVER;
GO
CREATE EVENT SESSION [DurationOver5Sec]
ON SERVER
ADD EVENT sqlserver.rpc_completed(
ACTION 
(
  sqlserver.client_app_name-- ApplicationName from SQLTrace
, sqlserver.client_hostname-- HostName from SQLTrace
, sqlserver.client_pid-- ClientProcessID from SQLTrace
, sqlserver.database_id-- DatabaseID from SQLTrace
, sqlserver.request_id-- RequestID from SQLTrace
, sqlserver.server_principal_name-- LoginName from SQLTrace
, sqlserver.session_id-- SPID from SQLTrace
)
WHERE 
(
duration >= 5000000
)
),
ADD EVENT sqlserver.sql_batch_completed(
ACTION 
(
  sqlserver.client_app_name-- ApplicationName from SQLTrace
, sqlserver.client_hostname-- HostName from SQLTrace
, sqlserver.client_pid-- ClientProcessID from SQLTrace
, sqlserver.database_id-- DatabaseID from SQLTrace
, sqlserver.request_id-- RequestID from SQLTrace
, sqlserver.server_principal_name-- LoginName from SQLTrace
, sqlserver.session_id-- SPID from SQLTrace
)
WHERE 
(
duration >= 5000000
)
),
ADD EVENT sqlos.wait_info(
ACTION 
(
  sqlserver.client_app_name-- ApplicationName from SQLTrace
, sqlserver.client_hostname-- HostName from SQLTrace
, sqlserver.client_pid-- ClientProcessID from SQLTrace
, sqlserver.database_id-- DatabaseID from SQLTrace
, sqlserver.request_id-- RequestID from SQLTrace
, sqlserver.server_principal_name-- LoginName from SQLTrace
, sqlserver.session_id-- SPID from SQLTrace
)
WHERE
(
duration > 5000 --This one is in milliseconds, and I'm not happy about that
            AND ((wait_type > 0 AND wait_type < 22) -- LCK_ waits
                    OR (wait_type > 31 AND wait_type < 38) -- LATCH_ waits
                    OR (wait_type > 47 AND wait_type < 54) -- PAGELATCH_ waits
                    OR (wait_type > 63 AND wait_type < 70) -- PAGEIOLATCH_ waits
                    OR (wait_type > 96 AND wait_type < 100) -- IO (Disk/Network) waits
                    OR (wait_type = 107) -- RESOURCE_SEMAPHORE waits
                    OR (wait_type = 113) -- SOS_WORKER waits
                    OR (wait_type = 120) -- SOS_SCHEDULER_YIELD waits
                    OR (wait_type = 178) -- WRITELOG waits
                    OR (wait_type > 174 AND wait_type < 177) -- FCB_REPLICA_ waits
                    OR (wait_type = 186) -- CMEMTHREAD waits
                    OR (wait_type = 187) -- CXPACKET waits
                    OR (wait_type = 207) -- TRACEWRITE waits
                    OR (wait_type = 269) -- RESOURCE_SEMAPHORE_MUTEX waits
                    OR (wait_type = 283) -- RESOURCE_SEMAPHORE_QUERY_COMPILE waits
                    OR (wait_type = 284) -- RESOURCE_SEMAPHORE_SMALL_QUERY waits
--OR (wait_type = 195) -- WAITFOR
                )
)
)
ADD TARGET package0.event_file
(
SET filename = 'DurationOver5Sec.xel',
max_file_size = 10,
max_rollover_files = 5
)
WITH 
(
MAX_MEMORY = 10MB
, MAX_EVENT_SIZE = 10MB
, STARTUP_STATE = ON
, MAX_DISPATCH_LATENCY = 5 SECONDS
, EVENT_RETENTION_MODE = ALLOW_MULTIPLE_EVENT_LOSS
);
ALTER EVENT SESSION DurationOver5Sec
ON SERVER
STATE = START;

What’s it all mean?

This captures all SQL Batch Completed and RPC Completed events that took over 5 seconds along with any waits that took over 5 seconds. Seems kind simple and much easier to read than the script to create a trace, but there are some things to point out here.

  • Duration – Milliseconds or Microseconds
  • File Location
  • Restart on server reboot

First, duration, is it milliseconds or microseconds? Trick question, it’s both!!! Ola Hallengren (b|t) opened This Connect Item which resulted in Microsoft apologizing and saying they’ll fix it in SQL 2016. They can “fix” it one of two ways. Either they’ll document them as being different scales and your script will still work without any modifications, or they’ll make them all be the same scale and the script I have here will behave differently on different versions of SQL Server. Anyways, the script I have here is 5 seconds all around, which is 5,000,000 microseconds for the SQL and 5,000 milliseconds for the waits.

Continuing on the duration confusion, the wait info’s duration still isn’t what you might think it is.  This is the duration for a single instance of a wait, not cumulative waits for the duration of a specific query or batch.  If you made a loop that executed 10 times calling a wait for delay of 1 second each time then those are 10 separate 1 second waits.  More important, if your query is doing physical reads from disk then it’s a ton of small PageIOLatch waits, not one large one this session will see.  I understand why it has to be this way, but understanding and enjoying are two different things.

The rest of this isn’t nearly as confusing, I promise.

The file location I’m using is just a filename without a path, which will default to where your error log files are. It’s going to be a much more complex name once SQL Server is done with it, and not just a underscore with a rollover count like server-side traces.  However, it will start how it’s written here and you can use wildcards to say which files you want to read.

Now when you restarted the server, or just SQL services, with server-side traces you’d have to manually start that trace again if you wanted it to run all the time. This is exactly how the script I have here works, too. However, Extended Events also added “Startup State” which means it will start when the server starts. The only issue I have with this is that it will ONLY automatically start when the services start. That’s all fine and good if you’re a patient person….it’s not all fine and good for me, so I manually start it when it’s created.

The other minor details I can see having questions on are on the Max Dispatch Latency and Event Retention Mode. These are limits on how it writes to the file location I’m using. Max Dispatch means that SQL Server will write it to the output file within that many seconds after the event actually happens, so it’s obviously an asynchronous action. Event Retention Mode can give SQL Server permission to say it’s too overworked to write the entries and skip them if needed, and I chose to tell SQL Server to drop as many as it wants to if I’d be slowing it down.  It’s all documented in BOL, and it’d be good for you to read through that instead of just listening to me picking out select details.

So, how do you read this? It’s going to put it all in XML, isn’t it!?!?

If you can read this, Kendra Little (b|t) may suggest you’re a demon robot! (While her comment makes me laugh, that post is a great XE reference or it wouldn’t be on here.) Well, she’s probably right in calling this next script names because friendly robots don’t treat your CPU like this. In fact, if you’re on a stressed server or want to do a lot of analysis on this, it’s going to be best to copy the .xel files to a server you care a little less about and shred the XML there.

Unfortunately, I wrote my script to automatically find the files and add a wildcard character, so you’d have to modify my script to run it on another server. The point is getting you started with as little frustration as possible, and this is perfect for that reason. The biggest catch is that the session has to be running for this to work without changes because it’s grabbing the full file name with path from the DMVs for running sessions.

Anyways, here goes….

DECLARE 
@SessionName SysName 
, @TopCount Int = 1000
--SELECT @SessionName = 'UserErrors'
SELECT @SessionName = 'DurationOver5Sec'
--SELECT @SessionName = 'system_health'
/* 
SELECT * FROM sys.traces
SELECT  Session_Name = s.name, s.blocked_event_fire_time, s.dropped_buffer_count, s.dropped_event_count, s.pending_buffers
FROM sys.dm_xe_session_targets t
INNER JOIN sys.dm_xe_sessions s ON s.address = t.event_session_address
WHERE target_name = 'event_file'
--*/SET STATISTICS IO, TIME ON
IF OBJECT_ID('tempdb..#Events') IS NOT NULL BEGIN
DROP TABLE #Events
END
IF OBJECT_ID('tempdb..#Queries') IS NOT NULL BEGIN
DROP TABLE #Queries 
END
DECLARE @Target_File NVarChar(1000)
, @Target_Dir NVarChar(1000)
, @Target_File_WildCard NVarChar(1000)
SELECT @Target_File = CAST(t.target_data as XML).value('EventFileTarget[1]/File[1]/@name', 'NVARCHAR(256)')
FROM sys.dm_xe_session_targets t
INNER JOIN sys.dm_xe_sessions s ON s.address = t.event_session_address
WHERE s.name = @SessionName
AND t.target_name = 'event_file'
SELECT @Target_Dir = LEFT(@Target_File, Len(@Target_File) - CHARINDEX('\', REVERSE(@Target_File))) 
SELECT @Target_File_WildCard = @Target_Dir + '\'  + @SessionName + '_*.xel'
--SELECT @Target_File_WildCard
SELECT TOP (@TopCount) CAST(event_data AS XML) AS event_data_XML
INTO #Events
FROM sys.fn_xe_file_target_read_file(@Target_File_WildCard, null, null, null) AS F
ORDER BY File_name DESC
, file_offset DESC 
SELECT  EventType = event_data_XML.value('(event/@name)[1]', 'varchar(50)')
, Duration_sec = CAST(event_data_XML.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT')/CASE WHEN event_data_XML.value('(event/@name)[1]', 'varchar(50)') LIKE 'wait%' THEN 1000.0 ELSE 1000000.0 END as DEC(20,3)) 
, CPU_sec = CAST(event_data_XML.value ('(/event/data[@name=''cpu_time'']/value)[1]', 'BIGINT')/1000000.0 as DEC(20,3))
, physical_reads_k = CAST(event_data_XML.value ('(/event/data  [@name=''physical_reads'']/value)[1]', 'BIGINT')/1000.0 as DEC(20,3))
, logical_reads_k = CAST(event_data_XML.value ('(/event/data  [@name=''logical_reads'']/value)[1]', 'BIGINT') /1000.0 as DEC(20,3))
, writes_k = CAST(event_data_XML.value ('(/event/data  [@name=''writes'']/value)[1]', 'BIGINT')/1000.0 as DEC(20,3))
, row_count = event_data_XML.value ('(/event/data  [@name=''row_count'']/value)[1]', 'BIGINT')
, Statement_Text = ISNULL(event_data_XML.value ('(/event/data  [@name=''statement'']/value)[1]', 'NVARCHAR(4000)'), event_data_XML.value ('(/event/data  [@name=''batch_text''     ]/value)[1]', 'NVARCHAR(4000)')) 
, TimeStamp = DateAdd(Hour, DateDiff(Hour, GetUTCDate(), GetDate()) , CAST(event_data_XML.value('(event/@timestamp)[1]', 'varchar(50)') as DateTime2))
, SPID = event_data_XML.value ('(/event/action  [@name=''session_id'']/value)[1]', 'BIGINT')
, Username = event_data_XML.value ('(/event/action  [@name=''server_principal_name'']/value)[1]', 'NVARCHAR(256)')
, Database_Name = DB_Name(event_data_XML.value ('(/event/action  [@name=''database_id'']/value)[1]', 'BIGINT'))
, client_app_name = event_data_XML.value ('(/event/action  [@name=''client_app_name'']/value)[1]', 'NVARCHAR(256)')
, client_hostname = event_data_XML.value ('(/event/action  [@name=''client_hostname'']/value)[1]', 'NVARCHAR(256)')
, result = ISNULL(event_data_XML.value('(/event/data  [@name=''result'']/text)[1]', 'NVARCHAR(256)'),event_data_XML.value('(/event/data  [@name=''message'']/value)[1]', 'NVARCHAR(256)'))
, Error = event_data_XML.value ('(/event/data  [@name=''error_number'']/value)[1]', 'BIGINT')
, Severity = event_data_XML.value ('(/event/data  [@name=''severity'']/value)[1]', 'BIGINT')
, EventDetails = event_data_XML 
INTO #Queries
FROM #Events
SELECT q.EventType
, q.Duration_sec
, q.CPU_sec
, q.physical_reads_k
, q.logical_reads_k
, q.writes_k
, q.row_count
, q.Statement_Text
, q.TimeStamp
, q.SPID
, q.Username
, q.Database_Name
, client_app_name = CASE LEFT(q.client_app_name, 29)
WHEN 'SQLAgent - TSQL JobStep (Job '
THEN 'SQLAgent Job: ' + (SELECT name FROM msdb..sysjobs sj WHERE substring(q.client_app_name,32,32)=(substring(sys.fn_varbintohexstr(sj.job_id),3,100))) + ' - ' + SUBSTRING(q.client_app_name, 67, len(q.client_app_name)-67)
ELSE q.client_app_name
END  
, q.client_hostname
, q.result
, q.Error
, q.Severity
, q.EventDetails
FROM #Queries q
--WHERE eventtype NOT IN /*rather typical filtering*/ ('security_error_ring_buffer_recorded', 'sp_server_diagnostics_component_result', 'scheduler_monitor_system_health_ring_buffer_record')
--AND eventtype NOT IN /*specific troubleshooting filtering*/ ('connectivity_ring_buffer_recorded', 'wait_info')
ORDER BY TimeStamp DESC 

The top is all I typically edit. The Session Name is ‘DurationOver5Sec’ for this purpose, but as you can see it can also read some of the information from the system_health session. The TopCount is just saying I’m only going to look at the most recent 1,000 records BEFORE I shred and filter it, mostly because I love my CPU and this changes how much XML I’m throwing at it.

No matter how I write this to filter before playing with XML or try to be 100% safe, my scripts will not be perfect for you.  In fact they may crash your server. Unfortunately, I’m only half kidding, because nothing is 100%. The space is limited to 50 MB, the overhead of writing to this file should be fine since we’re only working with events that took over 5 seconds asynchronously with data loss allowed, so I don’t foresee any issues in most cases unless you change these numbers. However, reading them gets expensive and you have to use the most caution here.  Keep in mind that you can put the .xel files on any server with SQL installed, even developer edition, and query it.

Why’d I write this and what else is going through my head?

The inspiration for this post actually came from a question a former coworker asked me quite a while ago, just seeing if I knew a lot about Extended Events.  At the time the technically correct answer was “I don’t know”, but instead I told him that although I didn’t have experience with Extended Events that he could look at Jonathan Kehayias’s (b|t) blog, specifically his XEvent A Day Series.  However, I also warned him that Jonathan’s work tends to go deeper than a beginner is prepared for and that he should start with his coworker, Erin Stellato’s (b|t), version that is easier to start with, specifically her Making the Leap From Profilers to Extended Events session she made available on YouTube. This actually inspired two blog posts, I just haven’t written the How to Say “I don’t know” Better post yet.

More Links

It is so wrong that I originally posted this without linking to Erin Stellato’s Extended Events Stairway Series on SQL Server Central.  There, problem is fixed.

Have fun, experiment (with queries), and ask questions.

Filed under: Monitoring, SQL Server, Troubleshooting, Tuning Tagged: baseline data, monitoring, Profiler, Script, XEvent

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating