In my last article "Suggestion of Datatypes" I described a technique to scan the existing data to determine a better data type for columns, which may save some space. In this article I will focus on determining the size of the objects. In other words, if I have a database of 50GB, I would like to know which are the largest tables in terms of occupying the space (including Indexes and Text/Image Data). The sp_spaceused system procedure can be used to determine this type of information. With this knowledge we can find out how the sp_spaceused stored procedure gets it information, and we can customize it based on our requirements.
With
this kind of routine, you should be able to keep an eye on growing
tables, and predict the future disk space requirements (assuming the
activities remain more or less the same).
Objective:
To determine the space taken by an object within the database.
Code Review:
If you are guessing I will be relying on sysindexes then you are correct. Before we go into the details of the coding, let me explain sysindexes in brief:
It contains one row for each index and table in the database. Column Indid explains whether the entry is of an index or a table:
When Indid = 0 it means a table without a clustered Index
When Indid =1 it means a table with a clustered Index
When Indid >1 it means an index
When Indid = 255 it means an entry for tables that have text or image data
For Indid =1 or 0, column Used determines the total count of pages for all index and table data; and for Indid = 255, Used is a count of the pages used for text or image data.
Column DPages stores the count of data pages for a table with or without clustered Index. For Indid= 255, it is set to 0.
As we know sysindexes, may not always report current information, so to ensure that, we will be required to run DBCC UPDATEUSAGE.
Let us go into the code now.
-- As mentioned earlier, to remove inaccuracies on sysindexes we will run:DBCC UpdateUsage (0)
WITH COUNT_ROWS, NO_INFOMSGS
-- We will create a table to store the information:
if NOT exists (select * from dbo.sysobjects
where id = object_id(N'[dbo].[SpaceUsedByObject]')
and OBJECTPROPERTY(id, N'IsUserTable') = 1)
Begin
CREATE Table SpaceUsedByObject
(
[Id] INT Identity(1,1),
ObjName sysname,
TotalSpaceUsed INT,
DataSpaceUsed INT,
IndexSpaceUsed INT,
RowCnt INT,
TodayDate DateTime Default Getdate()
)
End
As mentioned earlier, Indid 0 or 1 records the total count of pages for all index and table data, and Indid 255 determine count of the pages used for text or image data. With this information, let add records to the newly created table: */
INSERT INTO SpaceUsedByObject(ObjName,TotalSpaceUsed)SELECT Object_Name(sysindexes.ID),SUM(Used)
FROM sysindexes JOIN sysobjects
ON sysindexes.Id=sysobjects.Id and type='u'
WHERE Indid IN (0,1,255)
GROUP BY sysindexes.ID
-- We will create a temporary table (a table variable could also be used):
CREATE Table #SpaceUsedByObjectForDpage
(
ObjName sysname,
DataSpaceUsed INT,
rowcnt INT
)
-- Let us store the DPages
-- (Data pages which consist of actually table data):
INSERT INTO #SpaceUsedByObjectForDpage
SELECT Object_Name(sysindexes.ID),Dpages,rowcnt
FROM sysindexes
JOIN sysobjects
ON sysindexes.Id=sysobjects.Id
and type='u'
WHERE Indid IN (0, 1)
-- Now we will add the Text and images pages to Dpages:
UPDATE #SpaceUsedByObjectForDpage
SET DataSpaceUsed= DataSpaceUsed
+ (SELECT ISNULL(SUM(Used),0)
FROM SysIndexes S1
WHERE Object_Name(s1.Id)= #SpaceUsedByObjectForDpage.ObjName
AND indid=255
)
/* As we have the total
used space and Data pages (table space excluding Indexes), it is
fairly simple to determine the Index space (i.e. Total Space –
Data Pages), so let’s update the main table with our
calculation: */
UPDATE SpaceUsedByObject
SET IndexSpaceUsed = TotalSpaceUsed
- (SELECT DataSpaceUsed
FROM #SpaceUsedByObjectForDpage S1
WHERE S1.ObjName=SpaceUsedByObject.ObjName
AND SpaceUsedByObject.IndexSpaceUsed IS NULL
)
WHERE IndexSpaceUsed IS NULL
-- Now Data pages and row count:
UPDATE SpaceUsedByObject
SET DataSpaceUsed = ( SELECT S1.DataSpaceUsed
FROM #SpaceUsedByObjectForDpage S1
WHERE S1.ObjName=SpaceUsedByObject.ObjName
),
rowcnt=( SELECT S1.rowcnt
FROM #SpaceUsedByObjectForDpage S1
WHERE S1.ObjName=SpaceUsedByObject.ObjName
)
WHERE DataSpaceUsed IS NULL
Additional points
Our calculations are in terms of pages. Size of each page in SQL 2000 is 8KB. You can also determine page size by querying the undocumented system table:
SELECT lowfrom master.dbo.spt_values
where number = 1 and type = 'E'
We can come to this calculation by going the other way, meaning by calculating Index and Data pages, and then calculating total used space. We can set this up as a job monthly or biweekly, or on demand, to see the growth of a particular table, and determine a growth pattern of our database. Sysindexes have other information like total reserved space (column Reserved). and xmaxlen and minlen determine the
maximum and minimum size of the row.
If you see xmaxlen for any table to be more than 4030 bytes, it means only 1 record is able to fit in a page. So that means we will have page splits and space wasted.
Example If the records size is 5000 bytes, SQL Server is not able to utilize 3060 bytes and this will be true for all the records with similar size.
Or for example, if I have a record size of 2200 bytes, only 2 records will fit in the page, occupying 4400 bytes and leaving 3660 bytes under utilized. There are several ways to determine such information and manage our resources properly. In particular, see my previous article to try to determine a proper data type, and see if it is possible to reduce the record size so less unutilized space is left.
Conclusion:
The sysindexes system table has much important information, and we can build lots of reports from it to suit our needs. I do agree SQL Server has many system procedures and DBCC commands which help us to get most of the information, but at the same time I believe a true DBA should know how things are internally worked, so he/she can customized the result to suit their requirement.
We can use undocumented SP_MSTablespace and SP_MSindexspace to have similar information. Both the procedure will report inaccurate results if DBCC UpdateUsage is not executed. Ken Henderson in his book The Guru’s Guide to Transact-SQL had mentioned more than 100 undocumented commands, languages, and features.