June 11, 2020 at 3:00 am
Likes to play Chess
June 11, 2020 at 10:30 am
SELECT event_data = CONVERT(XML, event_data)
FROM sys.fn_xe_file_target_read_file(N'system_health*.xel', NULL, NULL, NULL)
WHERE event_data like '%xml_deadlock_report%';?
John
June 11, 2020 at 11:10 am
Unless someone has explicitly turned it off, and since most people don't even know it's there, on every single server, 2012+, you have an Extended Events session running called system_health. It captures deadlock info. Here's a query to use system_health:
WITH fxd
AS (SELECT CAST(fx.event_data AS XML) AS Event_Data
FROM sys.fn_xe_file_target_read_file(N'system_health*.xel', NULL, NULL, NULL) AS fx )
SELECT dl.deadlockgraph
FROM
(
SELECT dl.query('.') AS deadlockgraph
FROM fxd
CROSS APPLY event_data.nodes('(/event/data/value/deadlock)') AS d(dl)
) AS dl;
Now, in some systems, with a very high rate of errors, deadlocks, just general activity, you may only have a few hours or so of information. However, for many systems, you may have days or event weeks of info in the system_health data. This query works on AWS RDS as well. I haven't tested it on Azure SQL Database, but I'll put money down it works there too.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 11, 2020 at 6:45 pm
Heh... sounds almost as bad as the default trace. We blow through that in about 10 minutes.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 12, 2020 at 1:03 am
Mine ran 45 minutes.... THANK YOU, GRANT, for your response this morning. Saved me a day of on submitting prod request/waiting for result. This is more worrisome though: <<Now, in some systems, with a very high rate of errors, deadlocks, just general activity, you may only have a few hours or so of information. However, for many systems, you may have days or event weeks of info in the system_health data. >>
(reading your Query Store book today, by the way. Have a couple of questions but i like the concise and streitngtothepoint format of it). So i am now definitely starting another perf collector routine (i already have created a couple of good ones for QuerySTore)
to save deadlock info into a historical table every day.
QUESTION:
What is the best datatype for the column in which I will be saving XML Deadlock Graphs?
Likes to play Chess
June 12, 2020 at 8:20 am
xml is the best data type, so that you can easily shred the deadlock graphs to make them understandable. By the way, I recommend you capture deadlocks more often than daily. As Jeff pointed out, you can soon lose stuff from the system health session on a busy server. I capture every 15 minutes on my servers.
John
June 12, 2020 at 10:02 am
Heh... sounds almost as bad as the default trace. We blow through that in about 10 minutes.
Ha! Yeah, it is.
However, you can edit the system_health session if you want to and add additional files or make the files bigger. I'd be cautious about it, but the option is there.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 12, 2020 at 10:03 am
xml is the best data type, so that you can easily shred the deadlock graphs to make them understandable. By the way, I recommend you capture deadlocks more often than daily. As Jeff pointed out, you can soon lose stuff from the system health session on a busy server. I capture every 15 minutes on my servers.
John
Agree with XML. Not only is it the same as the deadlock graph from the system_health session, but it also can be opened in the SSMS gui if you want.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 12, 2020 at 3:22 pm
The deadlock graph contains the SQL statements involved in the deadlock, which may include parameter values.
If your database contains PII or other data that must follow regulations (GDPR, HIPAA, etc.), then assume your captured deadlock graphs contain regulated data, so access should be locked down and stale data should be routinely deleted to ensure any protected data does not remain longer than allowed. If you turn on trace flags to send deadlock details to the SQL Server Log, then the log could contain PII and you will need to also ensure your instance logs are trimmed and controlled.
Eddie Wuerch
MCM: SQL
June 13, 2020 at 7:14 pm
and what are advantages of using trace flags for saving deadlock data to transaction log compared to seeing them in the default XE sessions?
Likes to play Chess
June 14, 2020 at 3:25 am
and what are advantages of using trace flags for saving deadlock data to transaction log compared to seeing them in the default XE sessions?
Heh... it's a part of EVERYTHING that I forgot to mention on the other post. 😀
The real advantage is that you don't have to use Extended Events!
--Jeff Moden
Change is inevitable... Change for the better is not.
June 16, 2020 at 11:40 am
and what are advantages of using trace flags for saving deadlock data to transaction log compared to seeing them in the default XE sessions?
I'd say there aren't any advantages. Plucking the deadlock graphs straight out of the system_health is easy. There's not a thing you have to turn on to get system_health info. Because you can use the GUI, if you want, the graphs are easier to read than the error log. You don't have to sweat forgetting to turn on the trace flag. No system reboots.
The one shortcoming is, for some systems, the default size & number of system_health data files might not keep up with the load on the system. Meaning, there could only be a few hours of data. Most systems, this won't be the case. Some systems it will be. So, if you are in the "some" category, you can adjust the system_health Session and make more or bigger files. Just make sure you've got the disk capacity before you do so.
Other than that, as far as I'm concerned, it's all up to use system_health. Did you know it also captures long waits so you can see queries that ran long? Lots of great information in the system_health Extended Events Session.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 18, 2020 at 2:02 am
Thank you, Grant, for a very clear answer!
Continuing the subject, there is a tough question that has been processing itself through me for quite a few months already.
How (exactly) can we connect all these pieces? sp_WhoIsActive dump, or all DMVs based stuff , for that matter, with the Query Store data and all that with the Extended events info. I am almost sure most counters/rations won't match between them.. i failed to link QS (query Hash) to dmv /sp_WhoisActive data, no match at all, not even a single one.
But the bigger question is should I really seek such synergy? in what scenario would i want to look at the same production failure via all three tools data COMBINED? either matched/joined or not. Say, for same exact time period, say, one hour. what will i definitely see in one that i will definitely not see in another?
I think it is the kind of question that deserves to belong to the front page of https://www.scarydba.com !
as the 3 tools are different, and pursue a bit different purposes. yet there is a lot in common, too. essential things are... but I am not quite sure how and how much different the 3 "stores" are. Your input on this is highly appreciated.
Likes to play Chess
June 18, 2020 at 2:08 am
I think that one of the best answers (other than, of course, "It Depends) comes in the form of a suite of tools by Mr. Brent Ozar. I just refer to it as the sp_Blitz suite. He has some absolutely incredible tools in it. It's pretty easy to find with Yabingooducklehoo".
As for seeking the synergy that you describe, it may be possible but I don't know if such a thing would ever be worth it. It would be like knowing the history of everything to do with man made lighting. It's all on the same subject but it would be like knowing what candles have to do with fresnel lenses that have to do with lime light that have to do with skip bombs dropped from planes during WW II, etc, etc. They are connected but knowing how all those fit together might not have a thing to do with what you're trying to accomplish. The candle might be all that's necessary to get the job done correctly.
People like Brent have already figured out what the necessary connections are and have built packages to see the results of such connections or the "synergy" your looking for in a single subject. He even provides links to his articles in the output.
To wit, look for a YouTube on how to use sp_BlitzIndex. Never mind... here's a couple of links to flicks on two of his most popular packages...
https://www.youtube.com/watch?v=8Wo5M7kYO20
https://www.youtube.com/watch?v=5QAL9itupMA
As a bit of a sidebar, Brent's sp_BlitzCache does about the same thing as my sp_ShowWorst but I didn't know about sp_BlitzCache when I wrote sp_ShowWorst.
https://www.youtube.com/watch?v=EkLuXURMwso
He also has a lot of these tools all bundled up in what he calls his "First Responder Kit" and it's freakin' awesome.
https://www.brentozar.com/first-aid/
And it's all free.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2020 at 11:39 am
Jeff's got a great point with Brent's tools. However, they still don't do synergy. They're focused on the DMVs, which are fantastic, but not complete. There's info in Query Store not in the DMVs. There's info in Extended Events not in either of the other two.
Honestly, this is tough. I work for a company that makes a monitoring tool. We are trying to achieve that synergy. We use the DMVs and Extended Events (despite my constant pushing, we're still not leveraging Query Store). Yet, I wouldn't say we have full synergy. The fact is, there's too much. There's too much detail.
To a large degree, you have to think about the tools as separate tools. They all kind of do the same thing, yes. However, they're all very focused in different ways. The DMVs are quite good for an immediate overview. What's happening right now, or, what's happened relatively recently. However, the recent past represented by the DMVs is limited to what's in cache and it's an aggregate. If you need info on stuff that was never in cache or has been flushed out, you need to use a different tool.
Query Store is great for an historical view of query behavior. It's fantastic because it shows you a history of execution plans, as well as query performance metrics. Best of all, while the data is aggregated, the aggregations are broken up into time windows which gives us that all important "before and after" measurement capability. Before the recompile and after. Before the edit to the procedure and after. Before the new index and after. You get the idea. Query Store also introduces a vital missing bit of info from the DMVs, the standard deviation of it's aggregations. It's a great tool. However, it's an aggregate again. What do you do when you need the detail. Not simply, for example, what parameter a given procedure was compiled on, but specifically what parameter was used when it ran really long at 2AM? That's not there in Query Store or the DMVs.
Enter Extended Events. Here, we get all the detail. We can capture every single call. Every single statement. Recompiles, stats updates, cache misses, cache hits, more and more. Further through Causality Tracking, we can combine this information easily in ways we've never been able to before. Love me some Extended Events. However, consuming this data is a giant PITA. I teach a bunch of ways to deal with it. It's still not easy. Plus, once you're getting all the detail, you're getting all the detail. It's on you now to aggregate that info again, because, as shown above in the other tools, aggregation is useful.
Each of these tools serves a different purpose. Think of them like a hammer, a screwdriver and a wrench. All three are for attaching something. Nails, screws & bolts. Each has a specific purpose. To a degree, and poorly, you can even try using each tool to do the others job. However, really, your best bet is to focus on purpose and use the tool to do the thing you need rather than try to make any one tool do all of them.
Your choices. Personally, I'd recommend getting a 3rd party monitoring tool (I can recommend a very good one if you like). It gets you some of the synergy you're looking for. Then, as needed, you can plunge into the tools to do what they uniquely do. That's my 2 cents on it.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply