January 9, 2009 at 8:45 am
richcoulson (1/9/2009)
We have a warehouse database that is normally around 400gigs which compresses down to 80gigs with row level compression. Performance of queries varies widely depending on the disk io needed to satisfy the query. I do have one very disk io intensive query that normally runs in 2 minutes, but now runs in 1 minute. Cutting the time in half is very impressive. Both compressed and uncompressed databases are on the same server running under the same sql server instance. The server is one HP loaned us to do some testing with. It is their latest DL580 with 24 intel cores and 32 gigs of ram and is blazing fast.Something interesting I noticed, but have not had a chance to dig into is that once the data is cached in memory and the same queries ran a second time, the performance seems to take about 20% longer for the compressed database. Seems odd since I am pretty sure data in cache does not get compressed. Both data and indexes are using row level compression. I haven’t had a chance to look at optimization plans yet to see if they are different.
Can I just say...wow...that's a lot of compression. Have you looked at page level yet?
January 9, 2009 at 8:49 am
I thought that data was compressed in memory for read aheads, but once it needed to be compared to something else it was uncompressed.
That's great compression, but it is strange that those queries are taking longer. I might call PSS and make sure there's no bug here. Or you might gather some more info, post a question here or the MS forums and I'll ping someone at MS.
January 9, 2009 at 8:51 am
No, haven't tried page level yet. Our redgate compressed backups went down from 40gigs to 24gigs for the compressed database. Compression is going to be huge for us as we look for more sever consolidation. With more processors being available on servers now, we seem to be more and more disk constrained, which should be lessened with compression. We also have to population pilot and test database every night, which will now be quicker with smaller backups.
January 9, 2009 at 9:09 am
Just to clarify, the 20% increase is compared to the same cached query of the non-compressed database.
Times:
non-compressed first run 2 minutes, second run 11 seconds
compressed first run 1 minute, second run 14 seconds
I run dbcc freeproccache and dbcc dropcleanbuffers before each first run. The above times are consistantly recreated.
I have seen other queries with similar results for the cached runs. Keep in mind my testing has not been extensive and has only been with a handfull of queries that have been poor performers. We have to give our demo server back to hp, so today will be my last day of testing. We will be setting up a permanant test server in the next few weeks, so we will be able to resume further testing. Steve, if there are any stats such as statistics io you want now, I can send them to you. We do have Microsoft dedicate support engineer we can bring in as well once we get our permanent test server up.
January 11, 2009 at 9:54 pm
Thanks for the article! There are a lot of misconceptions about compression out there and you will never know if it's right for you, until you evaluate, test, evaluate, test, etc. The script provided with the article will go a long way towards simplifying this process for me!
October 27, 2009 at 12:54 pm
I used this on one of our larger databases. I had to make some changes as I was getting overflow errors.
Changed all the decimal definitions from Decimal(10,2) to Decimal(15,2)
Changed a number of items that were INT to BIGINT
--Revised Code
CREATE PROCEDURE [dbo].[USP_Compression_Savings_By_DB]
@checkdb nvarchar(255) = null-- database we will be checking (by default runs all databases)
, @admindbname nvarchar(255) = null-- administrative database for holding the data (defaults to database where the Procedure is created)
AS
/****************************************************************************************
* THIS IS FOR ESTIMATION PURPOSES ONLY, YOU SHOULD PERFORM FULL TESTS AS REGARDS ACTUAL *
* SAVINGS AND POTENTIAL NEGATIVE IMPACT TO PERFORMANCE PRIOR TO IMPLEMENTING ANY CHANGES*
* IN A PRODUCTION ENVIRONMENT.*
**
* RECOMMENDED TO RUN THIS AGAINST A STAGING OR DEVELOPMENT SYSTEM DUE TO THE POTENTIAL *
* IMPACT UPON THE SERVER WHILE RUNNING. AT THE VERY LEAST RUN THIS DURING YOUR SERVERS *
* QUIETIST PERIOD.*
****************************************************************************************/
/****************************************************************************************
* The procedure runs the procedure sp_estimate_data_compression_savings against all or *
* a selected database with all tables and all schemas.*
* It accepts two paramters, the first @checkdb will be the database that you wish to *
* estimate compression for. The second @admindbname is the database into which you wish *
* to place the results table dbo.estimated_data_compression_savings. By default this *
* will be located in the same database as the procedure was created.*
* We ignore system databases, as they cannot be compressed anyway.*
**
* Limitations:*
**
* cannot examine individual indexes, this could prove useful*
* cannot evaluate different partitions, very useful in DW systems*
* no ability to pass multiple databases to the @checkdb variable*
****************************************************************************************/
SET NOCOUNT ON
--declare all the variables
declare @varint int-- integer value of the database
, @SQLCmd1 NVARCHAR(4000)-- Dynamic SQL statment for sp_execute sql statements
, @ParmDef1 NVARCHAR(4000)-- Dynamic SQL statment for sp_execute sql statements
, @database varchar(200)-- used by the cursor
, @sql varchar(2000)-- holds the query we'll use for running the estimations page
, @sql2 varchar(2000)-- holds the query we'll use for running the estimations row
, @schema varchar(2000)-- allows us to query multiple schemas
, @table sysname-- table we are testing compression estimations again
, @kbsize bigint-- how big the data is
, @loop int-- what we use to loop around the table listing
/****** if there is no @admindbname variable we just use the current database ******/
if @admindbname is null
begin
select @admindbname = DB_NAME()
end
/****** if we are checking just one database check that it exists ******/
if @checkdb is not null
begin
if exists (select DATABASE_id from master.sys.databases where @checkdb = name and state = 0)
begin
select name from master.sys.databases where name = @checkdb and state = 0
end
else
begin
raiserror (N'The Database %s does not exist.', 16, 0, @checkdb)
end
end
/****** if we have a single database to check set the cursor for that one db ******/
if @checkdb is not null
begin
select @varint = DATABASE_id from master.sys.databases where @checkdb = name and state = 0
DECLARE database_names CURSOR
FOR SELECT database_id FROM master.sys.databases
WHERE database_id = @varint
end
else
/****** otherwise set the cursor to work on all databases (non system that is) ******/
begin
DECLARE database_names CURSOR
FOR SELECT database_id FROM master.sys.databases
WHERE database_id > 4 and state = 0
end
DECLARE @database_id int
OPEN database_names
FETCH NEXT FROM database_names INTO @database_id
WHILE (@@fetch_status <> -1)
BEGIN
IF (@@fetch_status <> -2)
BEGIN
/****** initialize the @checkdb parameter even though defined at the start ******/
select @checkdb = name from master.sys.databases where database_id = @database_id
--print @checkdb
/****** if the table we are dumping data into doesn't exist, let's create it ******/
SET @SQLCmd1 = N'IF NOT EXISTS (SELECT 1 FROM '+@AdminDBName+'.sys.tables
WHERE name = '+char(39)+'estimated_data_compression_savings'+char(39)+')
CREATE TABLE '+@AdminDBName+'.dbo.estimated_data_compression_savings
(database_name sysname
, [object_name] sysname
, [schema_name] sysname
, current_size_KB bigint
, estimated_size_page_KB bigint
, estimated_page_savings_KB bigint
, estimated_page_savings_percent decimal(15,2)
, estimated_size_row_KB bigint
, estimated_row_savings_KB bigint
, estimated_row_savings_percent decimal(15,2)) '
SET @ParmDef1 = N'@AdminDBName VARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@AdminDBName
/****** clear out any existing data from the table for the database we are checking ******/
SET @SQLCmd1 = N'DELETE FROM '+@AdminDBName+'.dbo.estimated_data_compression_savings
WHERE database_name = '+CHAR(39)+@checkdb+char(39)+''
SET @ParmDef1 = N'@AdminDBName VARCHAR(255), @checkdb NVARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@AdminDBName, @checkdb
/****** just in case the temp table exists, get rid of it ******/
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp
--IF OBJECT_ID('tempdb..#thedata') IS NOT NULL
-- DROP TABLE #thedata
/****** create the temp table to hold the current size information for each table in the database ******/
CREATE TABLE #temp (
theid INT IDENTITY(1,1),
table_name sysname ,
row_count BIGINT,
reserved_size VARCHAR(50),
data_size VARCHAR(50),
index_size VARCHAR(50),
unused_size VARCHAR(50))
/****** get the list of tables (with their schemas) for this database ******/
SET @SQLCmd1 = N'INSERT #temp (table_name, row_count,reserved_size,data_size,index_size,unused_size)
EXEC ['+@checkdb+']..sp_msforeachtable '+char(39)+'sp_spaceused'+char(39)+char(39)+'?'+char(39)+char(39)+char(39)+''
SET @ParmDef1 = N'@checkdb NVARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@checkdb
/****** and another temp table to create ******/
/****** for some reason this doesn't work with a # table ******/
IF OBJECT_ID('tempdb..thedata') IS NOT NULL
DROP TABLE tempdb..thedata
CREATE TABLE tempdb..thedata (
theidint
,table_namenvarchar (750)
,schemanamesysname
,row_countBIGINT
,col_countINT
,data_sizeBIGINT )
/****** get the information about the tables ******/
SET @SQLCmd1 = N'INSERT INTO tempdb..thedata
SELECT a.theid,
a.table_name,
s.name as schemaname,
a.row_count,
COUNT(*) AS col_count,
CAST(REPLACE(a.data_size, '+CHAR(39)+'KB'+CHAR(39)+','+CHAR(39)++CHAR(39)+') AS integer) as data_size
FROM #temp a
INNER JOIN ['+@checkdb+'].information_schema.columns b
ON a.table_name COLLATE database_default
= b.table_name COLLATE database_default
INNER JOIN ['+@checkdb+'].sys.objects o
ON a.table_name COLLATE database_default
= o.name COLLATE database_default
INNER JOIN ['+@checkdb+'].sys.schemas s
ON o.schema_id = s.schema_id
GROUP BY a.table_name, a.row_count, a.data_size, s.name,a.theid
ORDER BY CAST(REPLACE(a.data_size, '+CHAR(39)+'KB'+CHAR(39)+','+CHAR(39)++CHAR(39)+') AS integer) DESC'
--select * from #thedata
SET @ParmDef1 = N'@checkdb NVARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@checkdb
/****** get rid of the temp table, we don't need it any more ******/
DROP TABLE #temp
/****** create a couple of tables to hold the compression test information ******/
IF OBJECT_ID('tempdb..#data_compression_page') IS NOT NULL
DROP TABLE #data_compression_page
IF OBJECT_ID('tempdb..#data_compression_row') IS NOT NULL
DROP TABLE #data_compression_row
create table #data_compression_page
([object_name] sysname
, [schema_name] sysname
, index_id int
, partition_number int
, [size_with_current_compression_setting(KB)] bigint
, [size_with_requested_compression_setting(KB)] bigint
, [sample_size_with_current_compression_setting(KB)] bigint
, [sample_size_with_requested_compression_setting(KB)] bigint)
create table #data_compression_row
([object_name] sysname
, [schema_name] sysname
, index_id int
, partition_number int
, [size_with_current_compression_setting(KB)] bigint
, [size_with_requested_compression_setting(KB)] bigint
, [sample_size_with_current_compression_setting(KB)] bigint
, [sample_size_with_requested_compression_setting(KB)] bigint)
/****** here we are going to loop through the table/schema list and check the
estimated savings for both row and page compression ******/
select @loop = min(theid) from tempdb..thedata
while @loop > 0 and @loop is not null
begin
select @table = table_name from tempdb..thedata where theid = @loop
select @schema = schemaname from tempdb..thedata where theid = @loop
/****** this is where we build the statement to estimate the savings that compression could provide ******/
/****** at the page level ******/
select @sql = 'insert into #data_compression_page exec ['+@checkdb+']..sp_estimate_data_compression_savings
@schema_name = '''+@schema+'''
, @object_name = '''+@table+'''
, @index_id = null
, @partition_number = null
, @data_compression = ''page'';'
/****** at the row level ******/
select @sql2 = 'insert into #data_compression_row exec ['+@checkdb+']..sp_estimate_data_compression_savings
@schema_name = '''+@schema+'''
, @object_name = '''+@table+'''
, @index_id = null
, @partition_number = null
, @data_compression = ''row'';'
/****** get the data ******/
exec(@sql)
exec(@sql2)
/****** loop around ******/
select @loop = MIN(theid) from tempdb..thedata where theid > @loop and @loop is not null
end
/****** get rid of the data table, we no longer need it ******/
drop table tempdb..thedata
/****** update the main table with the page estimations ******/
SET @SQLCmd1 = N'INSERT INTO '+@AdminDBName+'.dbo.estimated_data_compression_savings
(database_name
, [object_name]
, [schema_name]
, current_size_KB
, estimated_size_page_KB
)
SELECT
'+CHAR(39)+@checkdb+char(39)+'
,[object_name]
,[schema_name]
,sum([size_with_current_compression_setting(KB)])
,sum([size_with_requested_compression_setting(KB)])
FROM #data_compression_page
group by [object_name]
,[schema_name]
'
SET @ParmDef1 = N'@AdminDBName VARCHAR(255), @checkdb NVARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@AdminDBName, @checkdb
/****** we create a table to hold the results of estimated row compression ******/
IF OBJECT_ID('tempdb..#rowupdates') IS NOT NULL
DROP TABLE #rowupdates
CREATE TABLE #rowupdates
(database_name sysname
, [object_name] sysname
, [schema_name] sysname
, current_size_KB bigint
, estimated_size_row_KB bigint
, estimated_row_savings_KB bigint
, estimated_row_savings_percent int)
/****** then we insert the estimated savings based on row compression ******/
SET @SQLCmd1 = N'INSERT INTO #rowupdates
(database_name
, [object_name]
, [schema_name]
, current_size_KB
, estimated_size_row_KB
)
SELECT
'+CHAR(39)+@checkdb+char(39)+'
,[object_name]
,[schema_name]
,sum([size_with_current_compression_setting(KB)])
,sum([size_with_requested_compression_setting(KB)])
FROM #data_compression_row
group by [object_name]
,[schema_name]
'
SET @ParmDef1 = N'@AdminDBName VARCHAR(255), @checkdb NVARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@AdminDBName, @checkdb
/****** now update the master table with the row compression estimations ******/
SET @SQLCmd1 = N'UPDATE '+@AdminDBName+'.dbo.estimated_data_compression_savings
set estimated_size_row_KB = r.estimated_size_row_KB
from #rowupdates r
INNER JOIN '+@AdminDBName+'.dbo.estimated_data_compression_savings e
ON r.object_name = e.object_name
and r.schema_name = e.schema_name
and r.database_name = e.database_name'
SET @ParmDef1 = N'@AdminDBName VARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@AdminDBName
/****** provide diff data between current and estimated ******/
SET @SQLCmd1 = N'UPDATE '+@AdminDBName+'.dbo.estimated_data_compression_savings
set estimated_page_savings_KB = current_size_KB - estimated_size_page_KB
, estimated_row_savings_KB = current_size_KB - estimated_size_row_KB
'
SET @ParmDef1 = N'@AdminDBName VARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@AdminDBName
/****** finally run a percentage calculation to show the estimated savings ******/
SET @SQLCmd1 = N'UPDATE '+@AdminDBName+'.dbo.estimated_data_compression_savings
set estimated_page_savings_percent =
((isnull(CAST(estimated_page_savings_KB AS Decimal(15,2)) / nullif(CAST(current_size_kb AS decimal(15,2)),0),0)) * 100)
, estimated_row_savings_percent =
((isnull(CAST(estimated_row_savings_KB AS Decimal(15,2)) / nullif(CAST(current_size_kb as decimal(15,2)),0),0)) * 100)
'
SET @ParmDef1 = N'@AdminDBName VARCHAR(255)'
EXEC sp_executesql @SQLCmd1
,@parmdef1
,@AdminDBName
END
/****** loop around the cursor ******/
FETCH NEXT FROM database_names INTO @database_id
END
CLOSE database_names
DEALLOCATE database_names
GO
david hay
October 27, 2009 at 1:25 pm
I also get the following warning when I run this. Note DBA is my working admin database where the procedure is installed.
Warning! The maximum key length is 900 bytes. The index 'sample_indexDBA05385A6FF40F888204D05C7D56D2B' has maximum length of 1040 bytes. For some combination of large values, the insert/update operation will fail.
Anyone else seen this?
Thanks!
David Hay
david hay
November 18, 2012 at 5:45 pm
My table is already compressed, is there way i can find out upto what extent it is compressed or how much it is compressed.
Thanks
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply