In the first post of this series I highlighted and described two stored procedures that are shipped from Microsoft. These stored procedures use different methods to display information about a table in a SQL Server database. In this post, I want to delve a little deeper into the sp_MStablespace stored procedure. In doing this, I plan to discover what causes the reporting difference between it and sp_spaceused. I will also unveil a script that can be used in a Set-based fashion to return this information, unlike the RBAR method required forsp_MStablespace. Like I stated in the first post, this analysis will be based on SQL 2008.
Problem
When one uses sp_MStablespace, one may be assuming that it will return the same information that sp_spaceused would return. While this procedure is undocumented and less widely used, the design is that it will give table and index information for the table one provides as an input into the procedure. In order to gain this information for more than one table (using this procedure), one must implement a looping mechanism.
Looping
I have been using a script that was passed along to me many years ago that helps to find the table information for all of the tables in a database. I have seen a similar script passed around on the internet in several locations. This script uses another Microsoft procedure to help loop through the tables and provide this sought-after information. The name of that procedure is sp_msforeachtable. Despite the looping mechanism employed, the script is pretty quick and dependable.
-- Create the temp table for further querying
CREATE TABLE #temp(
rec_idINT IDENTITY (1, 1),
table_nameVARCHAR(128),
nbr_of_rowsINT,
data_spaceDECIMAL(15,2),
index_spaceDECIMAL(15,2),
total_sizeDECIMAL(15,2),
percent_of_dbDECIMAL(15,12),
db_sizeDECIMAL(15,2))
-- Get all tables, names, and sizes
EXEC sp_msforeachtable @command1="insert into #temp(nbr_of_rows, data_space, index_space) exec sp_mstablespace '?'",
@command2="update #temp set table_name = '?' where rec_id = (select max(rec_id) from #temp)"
-- Set the total_size and total database size fields
UPDATE #temp
SET total_size = (data_space + index_space)/1024, db_size = (SELECT SUM(data_space + index_space)/1024 FROM #temp)
-- Set the percent of the total database size
UPDATE #temp
SET percent_of_db = (total_size/db_size) * 100
-- Get the data
SELECT *
FROM #temp
--ORDER BY nbr_of_rows DESC
ORDER BY total_size DESC
-- Comment out the following line if you want to do further querying
DROP TABLE #temp
The script is pretty straight forward in its design and goals. This looping can be resolved with a set based solution that works faster in most of my test cases.
Inconsistencies
The second problem noted was that the data returned from the sp_MStablespace proc does not always match the data returned by sp_spaceused. This is a substantially different problem to tackle. I poured over the two procedures for hours trying to find what the problem was that was causing the difference. My findings are somewhat surprising – they are both accurate and correct. How can they both be correct when both return different values? I hope to also explain that between this post and the subsequent post that will discuss sp_spaceused.
Discovery
In the first article, I mentioned some of the differences and subtleties of this stored procedure. I also made an assumption that this procedure will be deprecated or at least massively changed. As I discuss the internals of this procedure, I hope that assumption will be validated.
The objects employed in sp_MStablespace are sysobjects, sysindexes, sys.indexes, sys.partitions, and sys.allocation_units. This is actually not too bad of a start for rewriting this procedure due to the use of sys.indexes, sys.partitions, and sys.allocation_units. Let’s take a look at the use of the old and soon to be deprecated objects: sysobjects and sysindexes.
sysobjects
The role that this object plays is actually quite minor. This object should easily be replaced by sys.objects. The sysobjects view is being used to set the @id variable to the object id that is found in the sysobjects view. If the tablename that was passed into the proc was not found in sysobjects, then the procedure should raise an error.
sysindexes
The real heavy lifting of this procedure is being doing through the sysindexes view. From sysindexes, we are trying to determine the index size and the table size. The calculation is based on dpages and used. By summing those two fields, we will determine the size of the data in the table. Through the used value, we will determine the size of the index by subtracting the data size from the value we get from the used value. There are some additional qualifiers on those formulas to limit which index types will be calculated in which size calculation (i.e. datasize would be blobs, heaps and clustered indexes only). A further calculation is used in this procedure via the use of the spt_values table. The table is being used as a lookup to assign a pagesize value based on the value returned for the low attribute of “WINDOWS/NT.” One could have chosen to hard code this since it is just calculating the page size and we know that a page size is ~8k.
Caveat
Now that I have explained what is happening with those old deprecated objects, the caveat is that the procedure has a bit of logic in there to check for the version of SQL Server. Depending on the version, you could end up running the calculations based off the new objects. However, that does not eliminate the dependency on the compatibility objects. The same kind of logic is used to calculate the size of the data, but the indexes are still calculated via the sysindexes view. I find myself asking “Why Bother?” Take a look for yourself. I don’t see this procedure being released to a SQL 2000 environment, but it was written for backward compatibility. That seems like a waste of resources and development time to me. Either leave it as it was or rewrite the whole thing – that seems to make more sense to me than to make this procedure backwards compatible with the release of SQL 2005. The only thing I could imagine is that it was done with the intent of being able to be used in a Central Management server type of configuration.
Revision
I was looking to write a query that would replace the query I posted at the beginning of this article. As I delved into this, I knew I wanted something that would return consistent results with sp_spaceused. I wanted to have a reliable query that was quick and used the 2005 objects and DMVs. What I ended up with was something a bit bigger than the original query.
In SQL 2005, one can find a lot of information through the use of sys.dm_db_index_physical_stats. This function is quite handy in many situations where sysindexes was used previously. This function can be used to help determine index size as well as fragmentation levels as a couple of examples. For this new script, I needed to return all of the same data as the old script. I also wanted to include some additional information that I felt was beneficial to having a good insight into the database. So let’s break this script up again like I have shown in the past. I am reusing some code that I have demonstrated a few times for other scripts.
Part I
/* Part I */
--Drop Table #indstats
IF exists (SELECT * FROM tempdb.sys.objects WHERE name like '%#indstats%')
BEGIN
DROP TABLE tempdb.dbo.#indstats
END
BEGIN
CREATE TABLE #indstats (
IndStatsID INT PRIMARY KEY CLUSTERED
,database_id BIGINT
,index_id BIGINT
,IndexSizeMB DECIMAL(16,1)
,OBJECT_IDBIGINT
);
END
INSERT INTO #indstats (IndStatsID,database_id,index_id,OBJECT_ID,IndexSizeMB)
SELECT Row_Number() OVER (ORDER BY OBJECT_ID) AS IndStatsID
,database_id,index_id,OBJECT_ID
,CONVERT(DECIMAL(19,2),(SUM(ps.page_count))) * 8 /1024 AS IndexSizeMB--for accuracy
FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
GROUP BY database_id,OBJECT_ID,index_id;
I have one major departure here from how I populated this temp table in the past. I have opted to use the windowing function Row_Number rather than an Identity value. I made this change due to the fewer resources required to run it. Using an Identity could require more disk IO than using the Row_Number. I want this to run quickly and not hamper system responsiveness or system resources. In observed testing, this change helps improve overall performance by 50%. I tested this against SQL 2008 and SQL 2005 running different patches and on different hardware. The overall improvement in performance was consistent.
Part II
/* Part II */
DECLARE @dbsize DECIMAL(19,2)
SET NOCOUNT ON
/*
** Summary data.
*/
BEGIN
SELECT @dbsize = SUM(CONVERT(DECIMAL(19,2),CASE WHEN type = 0 THEN SIZE ELSE 0 END)) * 8/1024
FROM sys.database_files
END
This bit is quite simply just so I could provide the total data file size (excluding log file). Spoiler alert: This piece of code is also used in my new script for sp_spaceused.
Part III
/* Part III */
BEGIN
WITH RegData AS (
SELECT a.container_id,p.OBJECT_ID,p.index_id,us.database_id
,FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,NumRows = p.ROWS
,UsedPages = IsNull(a.used_pages,0)
,TotalPages = IsNull(a.total_pages,0)
,DataSizeMB = CONVERT(DECIMAL(19,2),IsNull(a.used_pages,0)) * 8/1024
,IndexSizeMB = CASE WHEN ps.index_id < 2 THEN 0 ELSE ps.IndexSizeMB END
,UserRequests = IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0)
,UserUpdates = IsNull(us.user_updates,0)
,LastUpdate = IsNull(us.last_user_update,null)
,RatioRequestsToUpdates = CAST(IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0) AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL)
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.hobt_id = a.container_id
And a.type = 1
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
--WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
)
, LOBData AS (
SELECT a.container_id,p.OBJECT_ID,p.index_id,us.database_id
,FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,NumRows = p.ROWS
,UsedPages = IsNull(a.used_pages,0)
,TotalPages = IsNull(a.total_pages,0)
,DataSizeMB = CONVERT(DECIMAL(19,2),IsNull(a.used_pages,0)) * 8/1024
,IndexSizeMB = CASE WHEN ps.index_id < 2 THEN 0 ELSE ps.IndexSizeMB END
,UserRequests = IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0)
,UserUpdates = IsNull(us.user_updates,0)
,LastUpdate = IsNull(us.last_user_update,null)
,RatioRequestsToUpdates = CAST(IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0) AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL)
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.partition_id = a.container_id
And a.type = 2
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
--WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
)
, OverFlowData AS (
SELECT a.container_id,p.OBJECT_ID,p.index_id,us.database_id
,FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,NumRows = p.ROWS
,UsedPages = IsNull(a.used_pages,0)
,TotalPages = IsNull(a.total_pages,0)
,DataSizeMB = CONVERT(DECIMAL(19,2),IsNull(a.used_pages,0)) * 8/1024
,IndexSizeMB = CASE WHEN ps.index_id < 2 THEN 0 ELSE ps.IndexSizeMB END
,UserRequests = IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0)
,UserUpdates = IsNull(us.user_updates,0)
,LastUpdate = IsNull(us.last_user_update,null)
,RatioRequestsToUpdates = CAST(IsNull(us.user_seeks,0) + IsNull(us.user_scans,0) + IsNull(us.user_lookups,0) AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL)
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.hobt_id = a.container_id
And a.type = 3
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
--WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
), IndexSum AS (
SELECT a.OBJECT_ID
,AllDataSizeMB = SUM(CASE WHEN a.index_id in (0,1)
THEN IsNull(a.DataSizeMB,0) + IsNull(p2.DataSizeMB,0) + IsNull(p3.DataSizeMB,0)
ELSE IsNull(p2.DataSizeMB,0) + IsNull(p3.DataSizeMB,0)
END)
FROM RegData a
LEFT Outer Join LOBData p2
ON p2.container_id = a.container_id
LEFT Outer Join OverFlowData p3
ON p3.container_id = a.container_id
GROUP BY a.OBJECT_ID
), SummaryInfo AS (
SELECT
TableName = MAX(a.TableName)
,InRowDataSizeMB = SUM(IsNull(a.DataSizeMB,0))
,LOBDataSizeMB = SUM(IsNull(p2.DataSizeMB,0))
,OFlowDataSizeMB = SUM(IsNull(p3.DataSizeMB,0))
,NumRows = MAX(COALESCE(a.NumRows,p2.NumRows,p3.NumRows,0))
,AllUsedPages = SUM(IsNull(a.UsedPages,0) + IsNull(p2.UsedPages,0) + IsNull(p3.UsedPages,0))
,AllPages = SUM(IsNull(a.TotalPages,0) + IsNull(p2.TotalPages,0) + IsNull(p3.TotalPages,0))
,FreeDataSpace = CONVERT(DECIMAL(19,2),SUM(IsNull(a.TotalPages,0) + IsNull(p2.TotalPages,0) + IsNull(p3.TotalPages,0))
- SUM(IsNull(a.UsedPages,0) + IsNull(p2.UsedPages,0) + IsNull(p3.UsedPages,0)))* 8 / 1024
,AllDataSizeMB = MAX(ids.AllDataSizeMB)
,IndexSizeMB = SUM(IsNull(a.IndexSizeMB,0)) + SUM(IsNull(p2.IndexSizeMB,0)) + SUM(IsNull(p3.IndexSizeMB,0))
,UserRequests = AVG(IsNull(a.UserRequests,0) + IsNull(p2.UserRequests,0) + IsNull(p3.UserRequests,0))
,UserUpdates = AVG(IsNull(a.UserUpdates,0) + IsNull(p2.UserUpdates,0) + IsNull(p3.UserUpdates,0))
,LastUpdate = MAX(COALESCE(a.LastUpdate,p2.LastUpdate,p3.LastUpdate,null))
,DatabaseSize = @dbsize
FROM RegData a
LEFT Outer Join LOBData p2
ON p2.container_id = a.container_id
LEFT Outer Join OverFlowData p3
ON p3.container_id = a.container_id
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = a.OBJECT_ID
And i.index_id = a.index_id
LEFT Outer Join IndexSum ids
ON i.OBJECT_ID = ids.OBJECT_ID
--WHERE filegroup_name(a.data_space_id) = 'Primary'
GROUP BY a.OBJECT_ID
)
SELECT TableName,InRowDataSizeMB,LOBDataSizeMB,OFlowDataSizeMB,NumRows,AllUsedPages
,AllPages,FreeDataSpace,AllDataSizeMB,AllDataSizeMB,AllDataSizeMB,IndexSizeMB
,UserRequests,UserUpdates,LastUpdate,DatabaseSize
,PercentofDB = ((IndexSizeMB + AllDataSizeMB) / DatabaseSize) * 100
FROM SummaryInfo
ORDER BY PercentofDB DESC
END
Did I mention that this was lengthier than the predecessor? I have five CTEs doing slightly different things here with this query. The first three are gathering the information for the different Allocation Units and the fifth is summarizing that data into a single set. The fourth cte is used to calculate the actual data size dependent on type of index employed. I wanted to be able to display the size of the different Allocation Units so I could get a true feel for how the database looks in a single query (with regards to size and table size). I also thought it important to be able to determine how much activity that table is seeing. Knowing if the table is highly active, has BLOB data, and it’s size could potentially help me in determining if that table should be in a separate filegroup and on a separate LUN.
I have also thrown in some additional information that will be consistent with the new script for sp_spaceused. The additional field here is in regards to free space. This was mostly used as a check between the two scripts to help in developing the final solutions. I felt it useful to get an idea of how much space is available of the allocated pages.
The critical calculations are performed based on information retrieved from the used_pages attribute in sys.allocation_units and the index size returned by sys.dm_db_index_physical_stats.
I left a few lines commented out in the script mostly for demonstration purposes. I wanted to also be able to see just the user objects (IsMsShipped = 0 ), and also to be able to filter by specific filegroup. Those are helpful pieces based on the task at hand.
Results
This script resolves the issue of returning the information for all tables in a set based fashion and it does it rapidly. In the cases that it does not perform rapidly, one can break it up into the separate segments should one decide to tweak the column set or query. The remaining question then is “Does it match the results of sp_spaceused?”. The answer to that is yes and no. I will discuss that in further detail in the final installment of this series. As the query is currently constructed, the answer is no. However, let it be understood that the two queries are really returning different things and one may perceive them to be returning the same thing. That is not entirely true.
You can download the script from here.