May 27, 2017 at 9:59 pm
Matt Miller (4) - Saturday, May 27, 2017 5:33 PMSorry one more update after scrolling up to your final request. If you want to subset the modInfoTracking fields to only what you want, you need that last cross apply to stop at ModInfoTracking.
Try this one.
SELECT xt.RowNum
--,recomp.data.query('*')
,Recomp.data.value('(*:Field[@FieldName="wszDb"])[1]/@FieldValue','NVARCHAR(128)') wszDb
,Recomp.data.value('(*:Field[@FieldName="wszSchema"])[1]/@FieldValue','NVARCHAR(128)') wszSchema
,Recomp.data.value('(*:Field[@FieldName="wszTable"])[1]/@FieldValue','NVARCHAR(128)') wszTable
,STU.data.value('(*:Field[@FieldName="wszColName"])[1]/@FieldValue','NVARCHAR(128)') wszColName
,STU.data.value('(*:Field[@FieldName="wszStatName"])[1]/@FieldValue','NVARCHAR(128)') wszColName
,STU.DATA.value('local-name(../.)','NVARCHAR(128)') AS PARENT
,STU.DATA.value('local-name(.)' ,'NVARCHAR(128)') AS ELEMENT
--,STU.DATA.value('(*:Field)[1]/@FieldName' ,'NVARCHAR(128)') AS FieldName
--,STU.DATA.value('(*:Field)[1]/@FieldValue' ,'NVARCHAR(128)') AS FieldVAlue
FROM #XMLTest xt
CROSS APPLY TheXML.nodes('//*:Recompile') RECOMP(DATA)
CROSS APPLY RECOMP.DATA.nodes('*:ModTrackingInfo') STU(DATA);
That's absolutely perfect. I was just going to post what I just came up with and I must have done it correctly because it's almost identical to what you posted above. To top it off, it's a Saturday on a holiday weekend (in the USA, anyway) and the very same fellow that started me up on XML a couple of years ago provides instant verification/gratification.
I originally ran Mark's good code last night against one of my production servers. It did exactly what was needed but it was slow. It took more than 4 hours to go through just 61,000 entries. I figured that the "bottom up" approach got a little expensive and so was looking for (pestering everyone for :blush: ) the "top down" approach where the database information is rendered only once instead of once for every occurrence of the ModTrackingInfo.
Not having seen Matt's post above yete, I thought, "Damn. If I have someone show me the final answer, then I've learned nothing" and got busy.
I went back and looked at Johan's and Eirikur's posts because I remembered they had both used APPLY and felt that might be the way to go. That's when I remembered the post (from years ago) that Matt and I were on for "flattening XML" Matt had taught me much about how to use OUTER APPLY for things like this and I had forgotten most of it because I've not had to work with XML since then. It's got a good example but I thought the number of OUTER APPLYs would get expensive and went back to Mark's code for the code in the SELECT list. It also dawned on me that Mark's code actually does the filtering at the FIELD level I needed, as well.
One of the problems with the data that I'm having to "shrefen" (heh... a "Modenism" for "shred and flatten" XML... just rolls off the tongue, doesn't it? π ) is that some of the entries don't contain the "Recompile" element at all. Others contain the "Recompile" element but not the "ModTrackingInfo" element. In fact, about half of all the elements are in one of those conditions or the other. It's a waste of time even looking at them. That's when I remembered the WHERE clause that Johan had posted.
Going back to Eirikur's eample, I picked out his example on how to avoid the name space. Rumor has it that MS has a file with SQL server for that name space that could improve performance on a necessarily non-indexed XML column but now I have two slightly different bits of code to test against the system.
Stitching it all together, here's what I came up with before looking at Matt's code above. Like I said, instant gratification that I finally figured this one out. I say "finally" because, as simple as the end result is, it took me a long time to figure out the simple method. What I really like about it is that there are no dot-dot notations or backslashes in the SELECT list.
--===== Namespace version
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS p)
SELECT tbl.RowNum
,DbName = db.DbInfo.value ('(p:Field[@FieldName="wszDb"]) [1] /@FieldValue','SYSNAME')
,SchemaName = db.DbInfo.value ('(p:Field[@FieldName="wszSchema"]) [1] /@FieldValue','SYSNAME')
,TableName = db.DbInfo.value ('(p:Field[@FieldName="wszTable"]) [1] /@FieldValue','SYSNAME')
,StatName = si.SchemaInfo.value ('(p:Field[@FieldName="wszStatName"]) [1] /@FieldValue','SYSNAME')
,ColumnName = si.SchemaInfo.value ('(p:Field[@FieldName="wszColName"]) [1] /@FieldValue','SYSNAME')
FROM #XMLTest tbl
OUTER APPLY tbl.TheXML.nodes ('//p:Recompile') db (DbInfo)
OUTER APPLY DbInfo.nodes ('p:ModTrackingInfo') si (SchemaInfo)
WHERE db.DBInfo.exist (N'p:ModTrackingInfo') = 1
;
--===== Non-namespace version
SELECT tbl.RowNum
,DbName = db.DbInfo.value ('(*:Field[@FieldName="wszDb"]) [1] /@FieldValue','SYSNAME')
,SchemaName = db.DbInfo.value ('(*:Field[@FieldName="wszSchema"]) [1] /@FieldValue','SYSNAME')
,TableName = db.DbInfo.value ('(*:Field[@FieldName="wszTable"]) [1] /@FieldValue','SYSNAME')
,StatName = si.SchemaInfo.value ('(*:Field[@FieldName="wszStatName"]) [1] /@FieldValue','SYSNAME')
,ColumnName = si.SchemaInfo.value ('(*:Field[@FieldName="wszColName"]) [1] /@FieldValue','SYSNAME')
FROM #XMLTest tbl
OUTER APPLY tbl.TheXML.nodes ('//*:Recompile') db (DbInfo)
OUTER APPLY DbInfo.nodes ('*:ModTrackingInfo') si (SchemaInfo)
WHERE db.DBInfo.exist (N'*:ModTrackingInfo') = 1
;
My hat's off to you good folks for helping me learn this stuff, especially since I have an immediate need for it to solve a nasty production problem. You've just gotta love this community!
I've off to the races to see what the performance of these methods are. Oddly enough, the original column of data doesn't actually exist at run time so a direct XML index isn't possible. I've been playing with a fix for that, as well. Once I'm done, "I'll be back" with an update.
Thank all of you again. There's a piece of what everyone of you posted in the code and I couldn't have figured it out without you.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2017 at 5:15 am
Havenβt had time to look at this but at first glance I must say that this is not bad at all Jeff, the only suggestion for immediate improvement is to remove the βexistβ in the WHERE clause and use IS NOT NULL instead, removes 4 operators from the execution plan.
π
There could also be a benefit of adding a temp table into the mix and first insert all the Recompile elements into the temp table before querying for the attribute values, something like this:
;WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p)
SELECT
xt.RowNum
,RECOMP.DATA.query('.') AS ELEMENT_XML
FROM #XMLTest xt
OUTER APPLY TheXML.nodes('//p:Recompile') RECOMP(DATA)
This would greatly reduce the size and the complexity of the XML work-set.
Further, you should use OPTION MAXDOP 1 as if this goes parallel then it will be very slow!
Regarding the namespace, there is virtually no difference between the wildcard namespace and the defined namespace queries when there is only one namespace used in the XML. The wildcard approach is often handy but sometimes not applicable depending on the structure of the XML, i.e. if same elements exist within different namespaces etc.
May 28, 2017 at 12:14 pm
F
Eirikur Eiriksson - Sunday, May 28, 2017 5:15 AMHaven’t had time to look at this but at first glance I must say that this is not bad at all Jeff, the only suggestion for immediate improvement is to remove the “exist†in the WHERE clause and use IS NOT NULL instead, removes 4 operators from the execution plan.
πThere could also be a benefit of adding a temp table into the mix and first insert all the Recompile elements into the temp table before querying for the attribute values, something like this:
;WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p)
SELECT
xt.RowNum
,RECOMP.DATA.query('.') AS ELEMENT_XML
FROM #XMLTest xt
OUTER APPLY TheXML.nodes('//p:Recompile') RECOMP(DATA)
This would greatly reduce the size and the complexity of the XML work-set.
Further, you should use OPTION MAXDOP 1 as if this goes parallel then it will be very slow!Regarding the namespace, there is virtually no difference between the wildcard namespace and the defined namespace queries when there is only one namespace used in the XML. The wildcard approach is often handy but sometimes not applicable depending on the structure of the XML, i.e. if same elements exist within different namespaces etc.
Great minds think alike. I was already setting up to do just that but your code is freakin' awesome compared to mine. I was using ".query" to get that into a working table but, because I did it all in the SELECT list (didn't know about using the "." there. Great tip! Thanks!), had to add a "root" node to it because there are, many times, multiple instances of the "Recompile" node. You're code does a great job of 1) splitting each instance out to a separate row (which also makes the snippets smaller), 2) is much faster because it avoids the previously necessary concatenation of a "root" node to make well-formed XML when multiple occurrences of the same tag are present, and 3) follows the very nice standard of doing it all through the FROM clause instead of the SELECT list like I had originally done using .query.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2017 at 3:10 pm
Going for the brass ring... here's a progress report.
The following code does the "temp table" optimization that Eirikur suggested. It examined 84,817 execution plans, eliminated about 6% of that as have a NULL return or not having a "Recompile" element, and saved the resulting 78,715 reduced XML Fragments in 00:08:29. That's atrocious for most things we do but, considering that it IS XML and it IS generated on-the-fly by the MS function for query plan XML generation and it IS for the entire server and some of the XML IS huge, that's not too bad.
Here's the code I used if anyone is interested.
--===== The tables we're hitting are used a whole lot and we want to suffer as little blocking as possible.
-- We also want all of the information possible even if it causes Before'n'After rows or dupes,
-- which will be eliminated later. With that in mind, this is one of the first places I've seen
-- where such anomalies just aren't going to matter, so we'll shift to READ UNCOMMITTED everywhere.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
;
--===== If the working table already exists, drop it.
IF OBJECT_ID('Scratch_Prod.scratch.StatsPreProcess','U') IS NOT NULL
DROP TABLE Scratch_Prod.scratch.StatsPreProcess
;
--===== Enable stats reporting in the execution plans.
DBCC TRACEON(8666)
;
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p)
SELECT --TOP 1000
XMLFragment = db.DbInfo.query('.')
INTO Scratch_Prod.scratch.StatsPreProcess
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
CROSS APPLY qp.query_plan.nodes('//p:Recompile') db (DbInfo)
OPTION (MAXDOP 1)
;
DBCC TRACEOFF(8666)
;
It did seem that the addition of any other filtering at this point caused things to go about 8 times slower.
Now for a little "shrefening". I'll be back.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2017 at 4:24 pm
There's only one term that comes to mind after all the incredible tutelage I've received on this thread and what it has resulted in and that would be UNBELIEVABLE!
When I first ran across Fabiano's post, I gave it a try. It took more than 10 minutes just to return the name of the stats that came into play and I still couldn't use that information because the names of stats are only unique within a database, not across all databases as the code necessarily does. To top it off, that was on only about 17,000 stored plans at the time.
To summarize what you good folks have helped me figure out...
1. The pre-processing into a temporary semi-permanent working table took only 00:08:29 and that was on a whopping 84,817 execution plans. That's 5 times the number of execution plans that I had started on and it actually returns useful information now.
2. Then, using everything else you good folks taught me, the following code produces a list of distinct statistics names along with the database and table each stat is associated with in just 00:01:05 and that's without doing any XML indexing.
3. The total time for the whole shebang is 00:09:24 and produces nearly no load on the server at all because the target tables are in a scratch database that uses the Simple Recovery Model, which allows SELECT INTO to be minimally logged and about twice as fast as if it wasn't.
4. Because of the very light load and the comparatively low run time involved, I can run the code once an hour without being obtrusive. Over time (a month or so so that I can pick up even on month end stats usage, this will allow me to create a list of stats that are actually used. So, what am I going to do with that? My big problem has been rebuilding stats on a regular basis. We have some insane tables that are more than 100 columns wide (not my design, I assure you) and a couple of hundred million rows in length and some ridiculous stats rebuild code that the well-meaning folks before me used. The thing that makes it ridiculous is that it examines all available stats and compares them to all available columns. If a column has no column stats on them, it automatically creates them. Even doing a "clean" stats rebuild of only the stats that need it takes more than 10 hours just for 1 of the databases on our primary production server. This code will allow me to find all unused stats (which can also naturally happen over time) and safely drop them.
We have more than 17,000 column based stats just in one of our main databases. The following code identified that there were less than 5,000 that were used in the last 48 hours for the entire server! Yes, I do realize that is partially due to execution plans dropping out of proc cache but, at < 10 minutes a pop, I can build a true history over a month and do the stats drops surgically instead of dropping them willy-nilly and letting the server rebuild them when a query needs them. For some of the larger tables, that was taking more than 10 minutes on some of the larger tables. Even if I don't drop the stats, I can still do a surgical rebuild until I get enough history to drop truly unused stats.
Here's the code that pulls the necessary stats info for all of this from the working table in my previous post. It looks damned simple now but 3 days ago, I was at a total loss for how to pull this off. Thank you, thank you, one and all. I might even be able to get a good night's sleep tonight knowing that the hard part of this fire-drill is now behind me.
--===== If the working table already exists, drop it.
IF OBJECT_ID('Scratch_Prod.scratch.StatsDbTableInfo','U') IS NOT NULL
DROP TABLE Scratch_Prod.scratch.StatsDbTableInfo
;
--===== Find all used stats by StatName and which database and table they're associated with and store
-- the unique occurances in a working table for comparisons over time.
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS p)
SELECT DISTINCT
DbName = db.DbInfo.value ('(p:Field[@FieldName="wszDb"]) [1] /@FieldValue','SYSNAME')
,SchemaName = db.DbInfo.value ('(p:Field[@FieldName="wszSchema"]) [1] /@FieldValue','SYSNAME')
,TableName = db.DbInfo.value ('(p:Field[@FieldName="wszTable"]) [1] /@FieldValue','SYSNAME')
,StatName = si.StatsInfo.value ('(p:Field[@FieldName="wszStatName"]) [1] /@FieldValue','SYSNAME')
,ColumnName = si.StatsInfo.value ('(p:Field[@FieldName="wszColName"]) [1] /@FieldValue','SYSNAME')
INTO Scratch_Prod.scratch.StatsDbTableInfo
FROM Scratch_Prod.scratch.StatsPreProcess tbl
OUTER APPLY tbl.XMLFragment.nodes ('p:Recompile') db (DbInfo)
OUTER APPLY DbInfo.nodes ('p:ModTrackingInfo') si (StatsInfo)
WHERE si.StatsInfo.value ('(p:Field[@FieldName="wszStatName"]) [1] /@FieldValue','SYSNAME') > N''
;
And, yeah... you can bet there is going to be a small series of articles on this. π
--Jeff Moden
Change is inevitable... Change for the better is not.
May 29, 2017 at 3:38 am
One just has to love this community!
π
Jeff, if you are running this regularly, why not collect the plan_handles separately and only dig into those which are new? That way you wouldn't be repeating large portion of the work every time.
Something likeSELECT
CP.plan_handle
INTO dbo.TBL_PLAN_HANDLES
FROM sys.dm_exec_cached_plans CP;
and then
SELECT
qp.query_plan
FROM dbo.TBL_PLAN_HANDLES PH
CROSS APPLY sys.dm_exec_query_plan(PH.plan_handle) qp
-- some filter returning new plans
You get my drift π
May 29, 2017 at 7:44 am
That's a great idea, Eirikur. Simple and will have a major impact by probably reducing the amount of work that actually needs to be done to hundreds instead of tens of thousand. This just keeps getting better and better. I'll let you know how it turns out performance wise.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 29, 2017 at 8:50 am
Jeff Moden - Monday, May 29, 2017 7:44 AMThat's a great idea, Eirikur. Simple and will have a major impact by probably reducing the amount of work that actually needs to be done to hundreds instead of tens of thousand. This just keeps getting better and better. I'll let you know how it turns out performance wise.
I normally do this with the addition of max exec time and such metrics in addition to the plan hash, if query is going worse then that should trigger an action.
π
May 29, 2017 at 12:24 pm
Eirikur Eiriksson - Monday, May 29, 2017 3:38 AMOne just has to love this community!
πJeff, if you are running this regularly, why not collect the plan_handles separately and only dig into those which are new? That way you wouldn't be repeating large portion of the work every time.
Something like
SELECT
CP.plan_handle
INTO dbo.TBL_PLAN_HANDLES
FROM sys.dm_exec_cached_plans CP;and then
SELECT
qp.query_plan
FROM dbo.TBL_PLAN_HANDLES PH
CROSS APPLY sys.dm_exec_query_plan(PH.plan_handle) qp
-- some filter returning new plansYou get my drift π
Would this be better using a CROSS APPLY and then filtering out the ones you already have or using an EXCEPT on the sets? I'm more than a bit out of my element here with XML. I know it's usually better with normal columns, but I don't know about XML. Have you already been down this road?
May 29, 2017 at 2:28 pm
Ed Wagner - Monday, May 29, 2017 12:24 PMEirikur Eiriksson - Monday, May 29, 2017 3:38 AMOne just has to love this community!
πJeff, if you are running this regularly, why not collect the plan_handles separately and only dig into those which are new? That way you wouldn't be repeating large portion of the work every time.
Something like
SELECT
CP.plan_handle
INTO dbo.TBL_PLAN_HANDLES
FROM sys.dm_exec_cached_plans CP;and then
SELECT
qp.query_plan
FROM dbo.TBL_PLAN_HANDLES PH
CROSS APPLY sys.dm_exec_query_plan(PH.plan_handle) qp
-- some filter returning new plansYou get my drift π
Would this be better using a CROSS APPLY and then filtering out the ones you already have or using an EXCEPT on the sets? I'm more than a bit out of my element here with XML. I know it's usually better with normal columns, but I don't know about XML. Have you already been down this road?
The way I normally do it is not only the plan changes but more the metrics, i.e. the same query takes longer then action something. Cross apply often filters out the difference so one has to be careful.
π
I do use both a hash (md5) of the query and the plan handle, if you only use the plan handle you will miss certain changes in behaviour.
Back to the pizza π
May 29, 2017 at 2:40 pm
Do you deliver? If you come all the way here, the pasta and vegetable courses are on me and the beer is on Jeff. πππ
May 29, 2017 at 2:59 pm
Ed Wagner - Monday, May 29, 2017 2:40 PMDo you deliver? If you come all the way here, the pasta and vegetable courses are on me and the beer is on Jeff. πππ
Ed, to you I'll deliver any day π
π
The oak is giving me 350 C using two third of the chamber, not too bad.
And we need to have a chat about the pasta, done it for many years cucina povera π
Of course the beer popsicles are on Jeff!!!
May 30, 2017 at 8:20 am
Eirikur Eiriksson - Monday, May 29, 2017 2:59 PMEd Wagner - Monday, May 29, 2017 2:40 PMDo you deliver? If you come all the way here, the pasta and vegetable courses are on me and the beer is on Jeff. πππEd, to you I'll deliver any day π
π
The oak is giving me 350 C using two third of the chamber, not too bad.
And we need to have a chat about the pasta, done it for many years cucina povera π
Of course the beer popsicles are on Jeff!!!
OK if I invite the dust bunnies?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2017 at 8:50 am
This will likely be the 2nd to the last update on this wonderful community effort. Thank all of you again for the great help and ideas.
I've converted the code to a stored procedure, deployed it, and setup a job to run every 20 minutes on the 7's. The first run (shortly after midnight this morning) too the expected 10 minutes. That first run is the one that does everything in cache. The subsequent runs where Plan Handles have already been processed and are only counted as a hit took literally seconds until this morning when people started hitting the GUI's and running other jobs but are usually just a little over a minute with the occasion spike to a little over 3 minutes and are mostly undetectable on PerfMon (good, efficient code thanks to all your help). Month end is coming up so I'll turn off the "optimize for ad hoc queries" setting on the server tomorrow night for a couple of days.
It's remarkable... there are over 17,000 stats just in one of the databases because of the bloody stats code that someone implemented a little over 5 years ago and because of some people (who know better now) that used DTA a lot and let it have it's way with the creation of stats and indexes alike (indexes are next on my target list even though I no longer do any index maintenance). So far, 120,000 plans have been checked and the stats collector is telling me that only 6,800 statistics are actually being used... on the whole bloody server!
Like I said previously, it was taken more than 10 hours just to update stats on that 17,000 stat database and so we only did it once per week and it didn't catch some of the ones that actually needed to be updated because of a stupid algorithm that updated stats based on the number of times some indexes on the tables were hit. That killed us because some of the bigger tables in the batch oriented databases were only being hit once or twice a day and those were ignored by the current system. Using this tool to identify which stats are actually being used, I should be able to identify and drop all the non-essential user generated (the bloody stats job being the worst there) stats and auto-generated stats that were truly used for "one off" queries. Yep... a mistake or two may be made but it won't be nearly as bad as the frequent recommendation to "drop all column based stats".
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2017 at 6:05 pm
Nice to see you having to do XML, Jeff!
Have you tested this with an XML index? Something like this?
DBCC TRACEON(8666)
;
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p)
SELECT --TOP 1000
ROWID = IDENTITY(int,1,1),
XMLFragment = db.DbInfo.query('.')
INTO StatsPreProcess
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
CROSS APPLY qp.query_plan.nodes('//p:Recompile') db (DbInfo)
OPTION (MAXDOP 1,recompile)
;
DBCC TRACEOFF(8666)
;
ALTER TABLE StatsPreProcess ADD PRIMARY KEY (ROWID);
CREATE PRIMARY XML INDEX ix_xml ON StatsPreprocess(XMLFragment);
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 15 posts - 16 through 30 (of 34 total)
You must be logged in to reply to this topic. Login to reply