At long last I am bringing the next installment in this mini-series. You can find the rest of the articles by perusing the last article I wrote on the topic. That article was a correction to the TableSpace script about which I had already blogged.
As you will recall, this series was designed to compare and contrast two MS provided stored procedures. Those procedures are sp_spaceused and sp_MStablespace. They both calculate table sizes and information but they do not always arrive at the same result. It was the difference in results that prompted this little series of blogs. In this article I will be examining sp_spaceused, or at least the re-write I did for this proc. There will likely be one more article in the series. That article will be to compare performance and wrap up the series.
Problem
As was stated in the Tablespace article, there are two main problems with this procedure. The first is that some measure of looping must be used, if one is to use the stored procedure as is, to gather all of the information for all of the tables in a database. This problem can be rectified by understanding the inner workings of the stored procedure. The second issue is that it does not return data that is consistent with that of sp_MStablespace. I would like for the data to be consistent between the two scripts that may be used. If the data is the same, then it will only be a matter of preference as to which script one decides to use. This should help eliminate doubt.
Looping
Much the same as needs to be done with the sp_MStablespace procedure, a loop can be used to return the data for all tables. Based on my experience, the looping query does not take long to execute. It really is somewhat unnecessary though. Here is a sample of how such a loop might be done.
/*
Looping mechanism for sp_spaceused
*/
-- Create the temp table for further querying
CREATE TABLE #spaceused(
rec_idINT IDENTITY (1, 1),
TableNameVARCHAR(128),
NumRowsVARCHAR(11),
reservedVARCHAR(18),
DataSpaceMBVARCHAR(18),
IndexSpaceMBVARCHAR(18),
unusedVARCHAR(18),
TotalSizeMBDECIMAL(15,2),
percent_of_dbDECIMAL(15,12),
DBSizeMBDECIMAL(15,2)
)
-- Get all tables, names, and sizes
EXEC sp_msforeachtable @command1="insert into #spaceused(TableName,NumRows,reserved, DataSpaceMB, IndexSpaceMB,unused) exec sp_spaceused '?'"
UPDATE #spaceused
SET DataSpaceMB = REPLACE(DataSpaceMB,' KB','')
,IndexSpaceMB = REPLACE(IndexSpaceMB,' KB','')
-- Set the TotalSizeMB and total database size fields
UPDATE #spaceused
SET TotalSizeMB = (CONVERT(DECIMAL(15,2),DataSpaceMB) + CONVERT(DECIMAL(15,2),IndexSpaceMB))/1024
,DBSizeMB = (SELECT SUM(CONVERT(DECIMAL(15,2),DataSpaceMB) + CONVERT(DECIMAL(15,2),IndexSpaceMB))/1024 FROM #spaceused)
-- Set the percent of the total database size
UPDATE #spaceused
SET percent_of_db = (TotalSizeMB/DBSizeMB) * 100
-- Get the data
--sp_spaceused
SELECT TableName,NumRows, CONVERT(DECIMAL(15,2),DataSpaceMB)/1024 AS DataSpaceMB
,CONVERT(DECIMAL(15,2),IndexSpaceMB)/1024 AS IndexSpaceMB, TotalSizeMB, percent_of_db, DBSizeMB
FROM #spaceused
ORDER BY TotalSizeMB DESC
/* Drop Temp Table
DROP TABLE #spaceused
*/
In this version of the loop, it must be noted that an additional step must be taken due to the nature of the data that is returned by sp_spaceused. This procedure returns ‘ KB’ in the output. This additional data must be treated so proper calculations can be performed.
Inconsistencies
As stated in the aforementioned article on sp_MStablespace, the data returned between the two does not always match (in regards to table size). The reason for this is due to a difference in calculation and it seems intent. Thus both procs can be deemed accurate and correct. However, that does not breed any higher level of confidence in the return data from both procedures. I hope to elaborate somewhat on these subtle differences.
Discovery
I am using the stored procedure from SQL 2008 as the basis for my analysis. One of the first things I have noted is that the procedure has been mostly updated in support of the new SQL 2008 objects. The one exception is that it still uses dbo.sysfiles to try and determine the database size and logsize. Beyond that, the proc makes use of the system views sys.partitions, sys.allocation_units, sys.internal_tables, sys.objects and sys.dm_db_partition_stats.
sys.objects
The use of this view is primarily to get the object_id of the table that was passed in to the stored procedure. For this particular version, this can be eliminated since we are trying to get a solution that does not require looping. For a later version, I will look into reincorporating that view into the script to allow either a single table or all tables to be returned.
sysfiles
As mentioned already, sysfiles is present in order to retrieve the database size and the logfile size. This can be replaced by sys.database_files. In testing, I have found that using sys.database_files is more consistent and it is also more accurate than using sysfiles.
Revision
Since this procedure was already mostly up to date with the new objects for 2005 / 2008, most of it is reusable and the changes are less dramatic than they were for the counterpart sp_MStablespace. Much the same as the script for MStablespace, I did employ the use of CTEs to get the data that I wanted. I use three CTEs to retrieve all of the necessary information. I am also using the same calculations that were used in the original procedure that determined table size. The exception being in the database size and the log size. For ease of consumption, I will once again break the script up into smaller chunks.
Part I
/*similar to sp_spaceused */
DECLARE @dbsize DECIMAL(19,2)
,@logsize DECIMAL(19,2)
SET NOCOUNT ON
/*
** Summary data.
*/
BEGIN
SELECT @dbsize = SUM(CONVERT(DECIMAL(19,2),CASE WHEN type = 0 THEN SIZE ELSE 0 END)) * 8/1024
, @logsize = SUM(CONVERT(DECIMAL(19,2),CASE WHEN type = 1 THEN SIZE ELSE 0 END)) * 8/1024
FROM sys.database_files
END
This is pretty straightforward here. I am retrieving and summing the values needed for the database size and the log size.
Part II
BEGIN
WITH FirstPass AS (
SELECT OBJECT_ID,
ReservedPage = CONVERT(DECIMAL(19,2),SUM(reserved_page_count)) * 8/1024,
UsedPage = CONVERT(DECIMAL(19,2),SUM(used_page_count)) *8/1024,
PageCnt = SUM(
CONVERT(DECIMAL(19,2),CASE
WHEN (index_id < 2)
THEN (used_page_count)
ELSE lob_used_page_count + row_overflow_used_page_count
END
)) * 8/1024,
RowCnt = SUM(
CASE
WHEN (index_id < 2)
THEN row_count
ELSE 0
END
)
FROM sys.dm_db_partition_stats
--Where OBJECTPROPERTY(OBJECT_ID,'IsMSShipped') = 0
GROUP BY OBJECT_ID
)
,InternalTables AS (
SELECT ps.OBJECT_ID,
ReservedPage = CONVERT(DECIMAL(19,2),SUM(reserved_page_count)) * 8/1024,
UsedPage = CONVERT(DECIMAL(19,2),SUM(used_page_count)) *8/1024
FROM sys.dm_db_partition_stats ps
INNER Join sys.internal_tables it
ON it.OBJECT_ID = ps.OBJECT_ID
And it.internal_type IN (202,204,211,212,213,214,215,216)
WHERE it.parent_id = ps.OBJECT_ID
--And OBJECTPROPERTY(ps.OBJECT_ID,'IsMSShipped') = 0
GROUP BY ps.OBJECT_ID
)
,Summary AS (
SELECT
ObjName = OBJECT_NAME (f.OBJECT_ID),
NumRows = MAX(f.rowcnt),
ReservedPageMB = SUM(IsNull(f.reservedpage,0) + IsNull(i.ReservedPage,0)),
DataSizeMB = SUM(f.PageCnt),
IndexSizeMB = SUM(CASE WHEN (f.UsedPage + IsNull(i.UsedPage,0)) > f.PageCnt
THEN ((f.UsedPage + IsNull(i.UsedPage,0)) - f.PageCnt) ELSE 0 END) ,-- Equivalent of max_record_size from sys.dm_db_index_physical_stats
UnusedSpace = SUM(CASE WHEN (f.ReservedPage + IsNull(i.ReservedPage,0)) > (f.UsedPage + IsNull(i.UsedPage,0))
THEN ((f.ReservedPage + IsNull(i.ReservedPage,0)) - (f.UsedPage + IsNull(i.UsedPage,0))) ELSE 0 END),
DBSizeMB = @Dbsize,
LogSizeMB = @logsize
FROM FirstPass F
LEFT Outer Join InternalTables i
ON i.OBJECT_ID = f.OBJECT_ID
GROUP BY f.OBJECT_ID
)
SELECT ObjName,NumRows, ReservedPageMB, DataSizeMB, IndexSizeMB, UnusedSpace, DBSizeMB, LogSizeMB,
PercentofDB = ((IndexSizeMb + DataSizeMB) / @DBsize) * 100
FROM Summary
ORDER BY PercentofDB DESC
END
Note here that I have left code to remove MSShipped objects from the total. This is a placeholder simply for if somebody would like to test that result. The only thing other than that to explain is the internal_types. I am checking for a range of internal table types just as is done in the original stored procedure. A quick rundown of what those types is as follows (and can be found on MSDN): 202 = xml index nodes, 204 = fulltext_catalog_map, 211 = FULLTEXT_AVDL, 213 = FULLTEXT_DOCID_STATUS, 214 = FULLTEXT_INDEXED_DOCID, 215 = FULLTEXT_DOCID_FILTER, 216 = FULLTEXT_DOCID_MAP, and I was unable to locate what type 212 is. I presume it is another FullText type. For those types, the data is not counted as part of the data but is aggregated into the index size.
The final point with this script to note is in regards to an additional comment I left in the script. The comment is in regards to how the size is calculated and I wanted to be certain that it was explained how that relates. In the MStablespace script, I use sys.dm_db_index_physical_stats to help determine the index and table size. My comment here relates to that DMF in that the spaceused version is equivalent to the max_record_size field in that DMF. This procedure and script determine the size of a table based on the maximum size and of a page and then multiplies that across all pages. The MStablespace script is more equivalent to calculating based on average fill of those pages. The MStablespace script can easily be modified to support the max size and thus bring both scripts into harmony. Beyond those points, I think the rest of the script is pretty self-explanatory.
Results
This script now permits us to get the full size of all tables in a database without the need for looping through a single stored procedure. We also see that at this point we have outlined why the two Microsoft stored procedures returned different results. With both scripts now returning the same data, and doing so consistently we can now begin to evaluate performance measures.