Once again it looks like I was early for the T-SQL Tuesday event. Last month I submitted my entry on reporting as was the topic last month. If you recall, last month I was a week early. As it turns out, this month I am a full month early. This month we have the distinct pleasure of writing about BLOBS. Last month I wrote about Reporting on BLOBS. This month, the event is hosted by Michael Coles.
This month the topic is a little more difficult for me. As luck would have it, I have written a fair share of articles about BLOB data, or at least involving BLOBs over the last month. What should I write that is unique to what I have already written? Or could I slide by with simply writing a recap of some of those posts? Hmmmmm…
Well, lucky I know of something that I can add to the meme on the topic of LOB or BLOB data. Oh, and it is more than just the fish you see to the left.
Did You Know…
In the DMVs for SQL 2005 and SQL 2008 there is more than one place you can find the information about the size of your LOB data? And it goes without saying that there is more than one way to find information about LOBs in your database. I have outlined a couple of methods already in previous posts (and in another post coming up that will conclude the table size procs series that I started).
One of the methods to find a little information about the LOBs is through a function that I have been using for various different purposes. This function is the sys.dm_db_index_physical_stats function. In this function there is a column returned called allocation_unit_type. If you want to learn more about this function and the column, I recommend you read what Microsoft has to say about it from this page. In short, this column contains what type of data is being stored for the specified database and object within that database with its varying other stats.
That’s all fine and dandy…
Since we know (from reading that Microsoft document and from past experiences) that we can retrieve various pieces of information from this function such as page counts, allocation unit types, record count and avg record size; it stands to reason that we can also determine the size of our BLOB infrastructure through the use of this function. The question is, how do we calculate the size and which option should we use to do it?
Weapons of Choice
Due to the length of time it may take to run the function, one may want to have a few options. It is also desirable that one use the most efficient method to retrieve the data. In order to find this, I decided to use test each of the function options to determine accuracy and speed. These options are Limited (Default), Sampled, and Detailed.
The script setup for each is essentially the same script. The only difference being that option has been changed for the data pertinent to that option. After running each of the scripts, I find the results to be somewhat surprising. I will be examining execution time, Io statistics and the execution plans and execution cost. I will also note that I am also dumping the results of each of the function calls into a temp table so I can run multiple queries against that data without having to wait for the function to run again.
Limited
IF exists (SELECT * FROM tempdb.sys.objects WHERE name like '%#LimitedLOB%')
BEGIN
DROP TABLE tempdb.dbo.#LimitedLOB
END
BEGIN
CREATE TABLE #LimitedLOB (
LOBStatsID INT PRIMARY KEY CLUSTERED
,database_id BIGINT
,index_id BIGINT
,IndexSizeMB DECIMAL(16,1)
,OBJECT_IDBIGINT
,AllocUnitTypeVARCHAR(24)
);
END
SET STATISTICS TIME ON
SET STATISTICS io ON
INSERT INTO #LimitedLOB (LOBStatsID,database_id,index_id,OBJECT_ID ,AllocUnitType,IndexSizeMB)
SELECT Row_Number() OVER (ORDER BY OBJECT_ID) AS IndStatsID
,database_id,index_id,OBJECT_ID,alloc_unit_type_desc
,CONVERT(DECIMAL(19,2),(SUM(ps.page_count))) * 8 /1024 AS IndexSizeMB--for accuracy
FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'Limited') ps
GROUP BY database_id,OBJECT_ID,index_id,alloc_unit_type_desc
SET STATISTICS TIME OFF
SET STATISTICS io OFF
The query is pretty straightforward and self-explanatory. I am querying the function for all tables in the database in a limited fashion. I am grouping those results into usable groupings based on allocation unit type, object, index and database. From this query, I see a CPU Time of 3062 ms and elapsed time of 8809ms. I have also taken the sum of all objects returned from this query to get a value of 1163.5 MB for this database. At this point I have not broken out the sum by allocation unit type.
Sampled
For the sampled run, I used the same query with minor changes.
IF exists (SELECT * FROM tempdb.sys.objects WHERE name like '%#SampledLOB%')
BEGIN
DROP TABLE tempdb.dbo.#SampledLOB
END
BEGIN
CREATE TABLE #SampledLOB (
LOBStatsID INT PRIMARY KEY CLUSTERED
,database_id BIGINT
,index_id BIGINT
,IndexSizeMB DECIMAL(16,1)
,OBJECT_IDBIGINT
,AllocUnitTypeVARCHAR(24)
);
END
SET STATISTICS TIME ON
SET STATISTICS io ON
INSERT INTO #SampledLOB (LOBStatsID,database_id,index_id,OBJECT_ID ,AllocUnitType,IndexSizeMB)
SELECT Row_Number() OVER (ORDER BY OBJECT_ID) AS IndStatsID
,database_id,index_id,OBJECT_ID,alloc_unit_type_desc
,CONVERT(DECIMAL(19,2),(SUM(ps.page_count))) * 8 /1024 AS IndexSizeMB--for accuracy
FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'Sampled') ps
GROUP BY database_id,OBJECT_ID,index_id,alloc_unit_type_desc
SET STATISTICS TIME OFF
SET STATISTICS io OFF
;
SET STATISTICS TIME OFF
SET STATISTICS io OFF
The only changes being the option and the name of the temp table. For this query, I get a CPU time of 3203ms, elapsed time of 5768ms and a total for the objects of 11678.2 MB. If I were to stop at this point I would conclude that the actual size would be larger than I just saw and I could expect that the cpu time and elapsed times should be longer. The bit of surprise is that the sampled return is faster overall than using the limited option.
Detailed
Same script as the prior to tests. Again, the only changes are in regards to the temp table name and the use of the Detailed option in the function. When I run the detailed query, I see that the cpu time jumps to 12313ms, elapsed time jumps to 29191ms and the total size decreases to 11635 MB. What? The total size decreased. It is more in line with what we see from the Limited option. With that knowledge, one may conjecture that it would suffice for these queries to simply run the function in Limited fashion. The speed of that option is comparable to the Sampled run and the data appears to be more accurate (since sample is just an estimate).
Conclusion
The execution plan and query cost for each run is identical in my testing. The real differences were seen in execution time. I can now take this data and modify the queries slightly further to just return the lines that tell me the size information for the allocation unit type of “LOB_DATA.” This will help me to gain a better understanding of the sizes of that kind of data in my environment and a better understanding of the environment in general.