One of the great features available since SQL 2008 is Row and/or Page Compression. However, even for those of you on SQL 2005 in production who might be there for a little while: there’s a decent feature you can take advantage of too that is very expensive to do in SQL 2008. Why? Well, Compression in SQL 2008/12 requires that Entreprise edition to be installed, so if you want to benefit from this mega space saver (like 1.3TB down to 650GB as I have seen, plus queries running at as little as 40% the time they took before) you will be happy with VARDECIMAL conversion in SQL 2005 (SP2+). Both have stored procedures that you can run to estimate how much space you will save: In 2008 it's sp_estimate_data_compression_savings (from Microsoft Best Practices) -- which randomly takes rows and gives you quite an accurate estimation of space saved/gained. Be aware that the cost for compression is CPU usage, so optimise on a server that has relatively low CPU percentage average.
Another condition you should be aware of, if you are input/output pressured on your system, meaning that you are waiting on your disks, then you can benefit from compression. With respect to performance gains, for the most accurate logical reads savings, you run this command while running your queries against the database candidate for compression:
SET STATISTICS IO ON
This is best to evaluate for details of the page reads while doing the comparison before and after compression. Using the shortcut Control-M is also an alternative to see the Actual Execution Plan for the queries you typically run on the large objects. For detail, see Brad's [McGehee] Compression examples here, as well as an updated version for 2012.
In SQL 2005, from sp2 onwards, you can do this:
(ref: http://msdn.microsoft.com/en-us/library/bb508963.aspx )
-- first run the estimate, and be patient, this shows up as a console command (DBCC)
EXEC sys.sp_estimated_rowsize_reduction_for_vardecimal 'SchemaName.TableName'
GO
exec sys.sp_db_vardecimal_storage_format ''DatabaseName'', ''ON''
GO
exec sp_tableoption ''dbo.BigTableLoadedWithDecimals'', ''vardecimal storage format'', 1
GO
--(/2005 options)
--For 2008, all SQL Server internal compression is done at the table or index level
-- AND for 2008 there is much better compression capability, namely Row and Page, you see below
EXEC sp_estimate_data_compression_savings 'SchemaName', 'TableName', NULL, NULL, 'ROW';
GO
EXEC sp_estimate_data_compression_savings 'SchemaName', 'TableName', NULL, NULL, 'PAGE';
GO
-- example testing started on a table that was 260MB, with a check on the storage used each time
-- I/O , according to the Actual Execution Plan for a normal SELECT started at 24.17 (no compression)
--first round of compression
ALTER TABLE schema.TableName REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = ROW) -- table became 180MB
--second round of compression
ALTER TABLE schema.TableName REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = PAGE) -- table now down to 80MB
-- I/O cost, according to the Actual Execution Plan for a SELECT down to 7.62 (just under a third)
-- then, only after all the desired tables are compressed (in 2005 0r 2008 as above) run the console command
DBCC SHRINKDATABASE(DatabaseName,0)
-- recover the free space (unless you need it for short term db growth)
-- the 0 value after the DatabaseName can be replaced with the amount of space you want to leave free
-- for expansion, normally if you have huge decimal types in tables with millions or rows
-- you'll see a huge difference in size of the db after the shrink.
Please take care while you are doing your big table conversions and make sure that you have enough space for the whole table to be added onto the MDF file and relatively heavy load on the LDF [log] data file. What you can do is switch to bulk mode (set recovery level) while you are doing the compression, then re-enable Full/Bulk or whatever recovery mode you were using before. Start with the smaller objects to compress first, so that progressively you free up space without so much risk of swelling, as if you were a pilot, easing up the throttle gracefully.
Basically what you are doing is clamping down on the wasted space per line with respect to Row level compression in SQL 2008 (characters even get dictionary compression, and also prefix compression), which is what SQL 2005 SP2 calls vardecimal (limited to that column type only however). Obviously, the 2008 higher-end editions have upped the cost for you to have this functionality, but it goes with the territory – a large organisation has major space to gain usually, and don’t forget that it’s not only the space in the database you are gaining, it’s all the cumulative space in backups over time you'll save, plus all the increased time (performance) for execution of queries - one could argue simply that by taking the huge number of times a query costs in I/O wait, multiplied by its frequency of execution to get the I/O hours saved. Hey boss, where's my bonus, I just saved the company n,000 hours of IO execution time:) Or better yet calculate your SAN $ per GB per cost, multiplied by the amount of gigabytes saved. Remember that clients will get their data result sets in less than half the time in most cases!
You could do this in the temp database also, rebuild using the temp while adding the compression to the specific object during creation, but make sure your temp is as big as the object you are rebuilding – such as the index+table data size, with a good fifty percent on top of that to be sure you’re not going to run out for other applications.
The Data Compression Wizard in SQL Server Management studio demystifies much of this, if you are already lost. Mind you, it’s not very efficient if you have to do this on a whole bunch of objects, in that case you would be better off using T-SQL (and this great script to prepare uncompressed object from SQL Solace, prepare ROW compression first, then PAGE). Sort your tables by the largest first (data pages should be in the thousands), and evaluate the space savings. If you are in the millions of data pages, use this compression – as mentioned before even in SQL 2005 post Service Pack 2 build you can take advantage of row compression by means of VARDECIMAL (but not page or row compression sadly).
If you are using tables with multiple non-clustered indexes, only compress those indexes that are used occasionally. Heavily used indexes are to be avoided therefore – so, as always, to be sure TEST, TEST, TEST....on several servers, your test environment, your disaster recovery environment, your dev., and even your laptop (still enjoying disk performance for databases on SSDs, and thankfully the price per GB has come down to around $1). Check out compression on your development environment just to save space on large tables –b/c Dev environments are usually starved for data, and then just watch what happens over the next while....check and see if queries are taking forever to run, and let a few of the developers know – maybe they’ll see something you missed too. SO SQL DBAs - get out there and compress!
BONUS - After reading up on Index options, I noticed, way down at the very bottom of the Alter Index page, another option to save disk space and improve performance (SQL 2008+), was that you can compress Indexes also:
ALTER INDEX IX_INDEX1
ON TableName
REBUILD
WITH ( DATA_COMPRESSION = ROW )
GO
ALTER INDEX IX_INDEX1
ON TableName
REBUILD
WITH ( DATA_COMPRESSION = PAGE )
GO