Blog Post

Decoded Statistics Names and More

,

Every now and again I am asked about the meaning behind the automatically generated names for statistics in SQL Server. The quick answer is short, sweet and really easy. I give them a quick explanation and then often refer them to the blog post by Paul Randal on the topic.

The better answer is to show them what the auto-generated names really mean, alongside the great explanation from Paul. Finally, after years of the topic being on my backlog, I am sharing a script that will help decode those names and help to prove out fully what’s in a statistic name.

Fun with Stats

First let’s look at an example for an auto-generated statistic name.

_WA_Sys_0000000A_0000001B

It is important to understand the naming convention used here and that there are four distinct parts to the name. Each part is separated by an underscore (_). The leading underscore is unnecessary, so just ignore it. Here is a quick breakdown of the four parts

  1. WA – as Paul explained, this is directly related to where the SQL team is based – WAshington state.
  2. Sys – also as Paul explained, this just means that the stat was auto-created. We will be able to see more of that later in the script results.
  3. 0000000A – This value is a Hex value converted to a string and represents the Column ID for the statistic.
  4. 0000001B – This value is a Hex value converted to a string and represents the Object ID for the object upon which the statistic was created.

Given that parts 3 and 4 of the auto-generated name are Hex strings, we have to do a little trickery in order to convert them to the exact hexadecimal format so we can then properly convert the values to their integer counterparts. Once getting those conversions performed, the rest of the script is fairly easy. Let’s take a look.

SELECT PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),4) AS Origin
        ,PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),3) AS Stattype
        ,CAST( CONVERT(VARBINARY(8),'0x'+RIGHT('00000000'+REPLACE('0x' + CONVERT(VARCHAR(10),PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),2)),'x',''),8),1) AS INT) AS ParsedColumnID
        ,CAST( CONVERT(VARBINARY(8),'0x'+RIGHT('00000000'+REPLACE('0x' + CONVERT(VARCHAR(10),PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),1)),'x',''),8),1) AS INT) AS ParsedObjectID
,s.name AS StatName
,s.object_id AS ObjectID
,sc.column_id AS ColumnID
,sc.stats_column_id
,s.stats_id
,i.index_id
, i.name AS IdxName
FROM sys.stats s
INNER JOIN sys.stats_columns sc
ON s.object_id = sc.object_id
AND sc.stats_id = s.stats_id
LEFT OUTER JOIN sys.indexes i
ON s.stats_id = i.index_id
AND i.object_id = s.object_id
INNER JOIN sys.objects o
ON o.object_id = s.object_id
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp
WHERE s.auto_created = 1;

As you can see, I am parsing the name into the various parts using the PARSENAME function. After splitting the parts out, I use some conversion techniques to try and get the hexadecimal representation (using VARBINARY). In the result set, I have included the object_id from the sys.stats view as well as the column_id from the sys.stats_columns view to demonstrate that the conversion is accurate. Here is an example result set. Also of note, I have filtered the results strictly down to those stats that have been auto created.

Looking at the results, the columns with “Parsed” in the name are those that have been decoded direct from the stats name, while the rest have come from the various views. Shown here, I have the ColumnID from the two sources in red, and the ObjectID in blue to help correlate the values.

That is great, now let’s step it up a notch to see just a few more results.

SELECT PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),4) AS Origin
        ,PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),3) AS Stattype
        ,CAST( CONVERT(VARBINARY(8),'0x'+RIGHT('00000000'+REPLACE('0x' + CONVERT(VARCHAR(10),PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),2)),'x',''),8),1) AS INT) AS ParsedColumnID
        ,CAST( CONVERT(VARBINARY(8),'0x'+RIGHT('00000000'+REPLACE('0x' + CONVERT(VARCHAR(10),PARSENAME(REPLACE(REPLACE(s.name,'_WA','WA'),'_','.'),1)),'x',''),8),1) AS INT) AS ParsedObjectID
,s.name AS StatName
,s.object_id AS ObjectID
,o.name AS ObjName
,sc.column_id AS ColumnID
,c.name AS ColumnName
,sc.stats_column_id
,s.stats_id
,i.index_id
, i.name AS IdxName,
   sp.last_updated,
   s.auto_created,
   sp.rows,
   sp.rows_sampled,
   sp.unfiltered_rows,
   sp.modification_counter,
   sp.steps,
   sp.persisted_sample_percent,
   i.is_hypothetical,
   i.is_ignored_in_optimization
FROM sys.stats s
INNER JOIN sys.stats_columns sc
ON s.object_id = sc.object_id
AND sc.stats_id = s.stats_id
LEFT OUTER JOIN sys.indexes i
ON s.stats_id = i.index_id
AND i.object_id = s.object_id
INNER JOIN sys.objects o
ON o.object_id = s.object_id
INNER JOIN sys.columns c
ON c.column_id = sc.column_id
AND c.object_id = s.object_id
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp
WHERE s.auto_created = 1
AND s.user_created = 0
UNION
SELECT CASE s.user_created
           WHEN 1 THEN
               'User'
           ELSE
               'MS'
       END AS Origin,
       CASE s.user_created
           WHEN 1 THEN
               'User'
           ELSE
               'Sys'
       END AS Stattype,
       '' AS ParsedColumnID,
       '' AS ParsedObjectID,
       s.name AS StatName,
       s.object_id AS ObjectID,
   s.name AS ObjName,
       sc.column_id AS ColumnID,
   c.name as ColumnName,
       sc.stats_column_id,
       s.stats_id,
       i.index_id,
       i.name AS IdxName,
   sp.last_updated,
   s.auto_created,
   sp.rows,
   sp.rows_sampled,
   sp.unfiltered_rows,
   sp.modification_counter,
   sp.steps,
   sp.persisted_sample_percent,
   i.is_hypothetical,
   i.is_ignored_in_optimization
FROM sys.stats s
INNER JOIN sys.stats_columns sc
ON s.object_id = sc.object_id
   AND sc.stats_id = s.stats_id
LEFT OUTER JOIN sys.indexes i
ON i.object_id = s.object_id
   AND i.index_id = s.stats_id
INNER JOIN sys.objects o
ON o.object_id = s.object_id
INNER JOIN sys.columns c
ON c.object_id = s.object_id
AND c.column_id = sc.column_id
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp
WHERE o.is_ms_shipped = 0
  AND s.auto_created = 0
ORDER BY s.object_id,
 s.stats_id;

This extended query will give us similar output to the following.

If you are looking closely at the code, you may be wondering why I have near identical versions of the query connected via a UNION statement. Well, if the stat is not an auto generated stat, then the parsing statements will error, and I would like to be able to retrieve extended information regarding all statistics so I can better understand what is going on as a whole with my statistics.

Wrapping it Up

Auto-generated statistics names can seem like they are entirely random, but there is a method to the madness. With a little effort and a bit of TSQL trickery, we can decode those names and reveal what the names really mean.

This article helps to decode those names and as a bonus, I show you how to pull additional pertinent information about your statistics to help you on your path to being an expetional data professional.

Interested in learning about some deep technical information instead? Check these out!

Want to learn more about your indexes? Try this index maintenance article or this index size article.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating