Overview
Two compression technologies are introduced in SQL 2008, backup compression, and table/index compression. Backup compression is accomplished by simply adding 'WITH COMPRESSION' to the normal list of backup options, and table/index compression can be accomplished with simple SSMS commands or T-SQL scripts. Compression can either be done at the page or row level. Both of these technologies are only available in Enterprise edition.
As a DBA that is responsible for managing our SQL backups and retention, as well as managing several fairly large databases that will soon be moved to SQL 2008, both of these technologies peaked my interest. I did some simple testing to see how effective the compression is, and what performance impact it has on the database server, and thought I would share the results.
Setup/Test Procedure
I did the testing on a workstation-class machine running Windows 7 with a dual core 2.8 GHz processor, 4GB of RAM, and put the SQL data on a raid 0 disk with two SCSI drives. SQL Server 2008 Enterprise Edition was installed on a non-raid OS drive, and the database files were placed on the raid 0 drive.
A test database was created with a single .mdf and .ldf file, with three identical tables. In addition to the primary key, each table contained two INT fields, a VARCHAR(MAX) field, a BIGINT field, and CHAR(5000) field, a DECIMAL field and a DATETIME field. A non-clustered index was created on the two INT fields, and a clustered index placed on the primary key field. One table was kept uncompressed; one table (including its index) was compressed using ROW compression, and the final table (including its index) was compressed using PAGE compression. The tables were each populated with one million random records, with each table containing the exact same records. The scripts used to create the database and populate the tables are included should you wish to do your own experimentation.
After creating, populating, and compressing the database tables, I did a standard FULL backup, followed by a FULL backup with compression followed by a zip process.
I then ran a script that does random reads and writes to the database while monitoring the server with PerfMon. The results I found follow.
Results
Backup Compression
The database ended up being just over 10GB after populating the three tables (one uncompressed, two compressed). A standard full backup produced a 9.26GB file in 7 minutes and 19 seconds. A full backup with compression produced a 1.80GB file in 2 minutes and 58 seconds, thus yielding an 80% smaller file in 60% of the time. To test the effectiveness of the compression, I ran the uncompressed backup through PKZIP with the maximum compression setting, and got a 1.79GB backup in an additional 8 minutes and 13 seconds. The uncompressed backup restored in 5 minutes and 26 seconds, and the compressed backup restored in 3 minutes 5 seconds, yielding a 43% faster restore time.
I haven't done extensive testing as to the reliability of the compressed backups, but assuming they are as reliable as standard backups, the time and disk space savings is significant. While I was able to get slightly more compression with PKZIP, and I'm sure other 3rd party compression utilities could possibly yield even more compression, the cost of the CPU time to obtain the small difference is most likely prohibitively expensive, especially for larger databases. The only disadvantage I can see is higher CPU usage, but I think it's more than offset by the reduced backup time, and if it is truly an issue I suspect (though I haven't proven it) that one could use the new resource governor feature in SQL 2008 to keep it in check if truly necessary. One item worth noting is that backup compression is only available in Enterprise edition of SQL 2008 (though according to books online any edition can restore a compressed backup).
Table/Index Compression
After populating all three tables with a million random records, each table contained 7990MB of data and 44MB of index data. After compression, the table compressed with row-level compression showed 709MB of data and 14.9MB of index data, giving a disk space savings of about 91%. The row-level compression took 4 minutes and 16 seconds to complete. The table compressed with page-level compression showed 708MB of data and 13.7MB of index data, again yielding a space savings of roughly 91% in 4 minutes and 27 seconds.
Once the tables were compressed, I turned loose my scripts that rapidly selected and wrote data in a random fashion for each of the three tables in the database (though I made sure the execution was as identical as possible between the three runs). After 3 minutes of running the same script on each table, the uncompressed table contained 605 additional rows, the table with row-level compression contained 684 additional rows, and the table with page-level compression contained 750 additional rows. While running the tests, I also measured CPU usage, average disk sec/read and sec/write, and disk bytes/sec. The results are shown in the graphs below.
CPU usage was increased by as much as 20% in the initial part of the test, but settled down to be less significant as the test progressed. On average, the page-compressed data caused a CPU usage increase of around 6%, and the row-compressed data caused a CPU usage increase of around 12%. If your server's CPU usage is already high, this may be a significant enough increase to cause concern, but in most cases I suspect it would be insignificant.
There was a slight but noticeable decrease in the average sec/Read time on the disk, with row-compression seeming to affect the speed more so than page-compression. It is probably unlikely that this savings would offset the higher CPU usage in systems with a high rate of transactions.
The avg. disk sec/write counter fluctuated significantly throughout the test, and I don't feel that any conclusions can be drawn from the data collected.
As you can see from the disk bytes/sec counter, there is a considerable savings caused by both row and page level compression, with the page level compression yielding the highest improvement.
Conclusion
I found that the space savings and increased speed of compressed backups to be quite impressive, and think the only possible reasons for not taking advantage of it would be if your SQL server is already struggling with high CPU usage, or if you simply don't have the Enterprise edition.
Likewise, the space savings for both page and row-level compression of tables and indexes is significant, and I found performance to be similar if not slightly improved when compressed. Again, the only disadvantage I could find was the slightly higher CPU usage and the fact that it is only available in the Enterprise edition. I also really appreciate the fact that the compression takes place at the table/index level rather than the database level. This allows the possibility of selectively compressing larger or less used tables while leaving high transaction tables uncompressed to help achieve a balance of space savings and performance hits. It's also worth noting that for a partitioned table, compression can be configured separately for each partition, allowing you to compress some partitions while leaving others uncompressed. Again, using this feature selectively can potentially be highly useful in many cases.
In these tests, the load was not significant enough to overly tax the CPU or the disks, and the compressed tables actually seemed to result in slightly higher read and write performance compared with the uncompressed table. While it is likely this performance increase would not occur in a highly used SQL server, in low or moderately used servers the space savings will likely offset any performance hit caused by the increased CPU usage. If compression is strategically used on specific tables in your database, it would appear to be a highly useful tool to help reduce disk space used while not incurring an unacceptable level of performance degradation.
While there are certainly cases where you would not want to use the compression features in SQL 2008, there are also cases where they can yield significant disk space savings with little or no impact on performance. It is important to test these tools in your own environment before drawing a any final conclusions. The links to Microsoft's books online entries on the topics, see the links below.
Microsoft's documentation on compressed tables and indexes can be found here:
http://technet.microsoft.com/en-us/library/cc280449.aspx
Microsoft's documentation on backup compression can be found here:
http://technet.microsoft.com/en-us/library/bb964719.aspx
Disclaimer
As stated earlier, each SQL server setup is different, and thorough testing must be done in a development environment before making a final decision on implementing the compression technologies discussed in this article.