Every now and again I am asked about the meaning behind the automatically generated names for statistics in SQL Server. The quick answer is short, sweet and really easy. I give them a quick explanation and then often refer them to the blog post by Paul Randal on the topic.
The better answer is to show them what the auto-generated names really mean, alongside the great explanation from Paul. Finally, after years of the topic being on my backlog, I am sharing a script that will help decode those names and help to prove out fully what’s in a statistic name.
Fun with Stats
First let’s look at an example for an auto-generated statistic name.
_WA_Sys_0000000A_0000001B
It is important to understand the naming convention used here and that there are four distinct parts to the name. Each part is separated by an underscore (_). The leading underscore is unnecessary, so just ignore it. Here is a quick breakdown of the four parts
- WA – as Paul explained, this is directly related to where the SQL team is based – WAshington state.
- Sys – also as Paul explained, this just means that the stat was auto-created. We will be able to see more of that later in the script results.
- 0000000A – This value is a Hex value converted to a string and represents the Column ID for the statistic.
- 0000001B – This value is a Hex value converted to a string and represents the Object ID for the object upon which the statistic was created.
Given that parts 3 and 4 of the auto-generated name are Hex strings, we have to do a little trickery in order to convert them to the exact hexadecimal format so we can then properly convert the values to their integer counterparts. Once getting those conversions performed, the rest of the script is fairly easy. Let’s take a look.
SELECT PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),4) AS Origin ,PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),3) AS Stattype ,CAST( CONVERT(VARBINARY(8),'0x'+RIGHT('00000000'+REPLACE('0x' + CONVERT(VARCHAR(10),PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),2)),'x',''),8),1) AS INT) AS ParsedColumnID ,CAST( CONVERT(VARBINARY(8),'0x'+RIGHT('00000000'+REPLACE('0x' + CONVERT(VARCHAR(10),PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),1)),'x',''),8),1) AS INT) AS ParsedObjectID ,s.name AS StatName ,s.object_id AS ObjectID ,sc.column_id AS ColumnID ,sc.stats_column_id ,s.stats_id ,i.index_id , i.name AS IdxName FROM sys.stats s INNER JOIN sys.stats_columns sc ON s.object_id = sc.object_id AND sc.stats_id = s.stats_id LEFT OUTER JOIN sys.indexes i ON s.stats_id = i.index_id AND i.object_id = s.object_id INNER JOIN sys.objects o ON o.object_id = s.object_id CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp WHERE s.auto_created = 1;
As you can see, I am parsing the name into the various parts using the PARSENAME function. After splitting the parts out, I use some conversion techniques to try and get the hexadecimal representation (using VARBINARY). In the result set, I have included the object_id from the sys.stats view as well as the column_id from the sys.stats_columns view to demonstrate that the conversion is accurate. Here is an example result set. Also of note, I have filtered the results strictly down to those stats that have been auto created.
Looking at the results, the columns with “Parsed” in the name are those that have been decoded direct from the stats name, while the rest have come from the various views. Shown here, I have the ColumnID from the two sources in red, and the ObjectID in blue to help correlate the values.
That is great, now let’s step it up a notch to see just a few more results.
SELECT PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),4) AS Origin ,PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),3) AS Stattype ,CAST( CONVERT(VARBINARY(8),'0x'+RIGHT('00000000'+REPLACE('0x' + CONVERT(VARCHAR(10),PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),2)),'x',''),8),1) AS INT) AS ParsedColumnID ,CAST( CONVERT(VARBINARY(8),'0x'+RIGHT('00000000'+REPLACE('0x' + CONVERT(VARCHAR(10),PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),1)),'x',''),8),1) AS INT) AS ParsedObjectID ,s.name AS StatName ,s.object_id AS ObjectID ,o.name AS ObjName ,sc.column_id AS ColumnID ,c.name AS ColumnName ,sc.stats_column_id ,s.stats_id ,i.index_id , i.name AS IdxName, sp.last_updated, s.auto_created, sp.rows, sp.rows_sampled, sp.unfiltered_rows, sp.modification_counter, sp.steps, sp.persisted_sample_percent, i.is_hypothetical, i.is_ignored_in_optimization FROM sys.stats s INNER JOIN sys.stats_columns sc ON s.object_id = sc.object_id AND sc.stats_id = s.stats_id LEFT OUTER JOIN sys.indexes i ON s.stats_id = i.index_id AND i.object_id = s.object_id INNER JOIN sys.objects o ON o.object_id = s.object_id INNER JOIN sys.columns c ON c.column_id = sc.column_id AND c.object_id = s.object_id CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp WHERE s.auto_created = 1 AND s.user_created = 0 UNION SELECT CASE s.user_created WHEN 1 THEN 'User' ELSE 'MS' END AS Origin, CASE s.user_created WHEN 1 THEN 'User' ELSE 'Sys' END AS Stattype, '' AS ParsedColumnID, '' AS ParsedObjectID, s.name AS StatName, s.object_id AS ObjectID, s.name AS ObjName, sc.column_id AS ColumnID, c.name as ColumnName, sc.stats_column_id, s.stats_id, i.index_id, i.name AS IdxName, sp.last_updated, s.auto_created, sp.rows, sp.rows_sampled, sp.unfiltered_rows, sp.modification_counter, sp.steps, sp.persisted_sample_percent, i.is_hypothetical, i.is_ignored_in_optimization FROM sys.stats s INNER JOIN sys.stats_columns sc ON s.object_id = sc.object_id AND sc.stats_id = s.stats_id LEFT OUTER JOIN sys.indexes i ON i.object_id = s.object_id AND i.index_id = s.stats_id INNER JOIN sys.objects o ON o.object_id = s.object_id INNER JOIN sys.columns c ON c.object_id = s.object_id AND c.column_id = sc.column_id CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp WHERE o.is_ms_shipped = 0 AND s.auto_created = 0 ORDER BY s.object_id, s.stats_id;
This extended query will give us similar output to the following.
If you are looking closely at the code, you may be wondering why I have near identical versions of the query connected via a UNION statement. Well, if the stat is not an auto generated stat, then the parsing statements will error, and I would like to be able to retrieve extended information regarding all statistics so I can better understand what is going on as a whole with my statistics.
Wrapping it Up
Auto-generated statistics names can seem like they are entirely random, but there is a method to the madness. With a little effort and a bit of TSQL trickery, we can decode those names and reveal what the names really mean.
This article helps to decode those names and as a bonus, I show you how to pull additional pertinent information about your statistics to help you on your path to being an expetional data professional.
Interested in learning about some deep technical information instead? Check these out!
Want to learn more about your indexes? Try this index maintenance article or this index size article.