In one of my recent assignments, my client asked me for a solution, to reduce the disk space requirement, of the staging database of an ETL workload.
It made me study and compare the Table Compression feature of SQL Server. This article will not explain Compression but will compare the storage and performance aspects of Compressed vs Non Compressed tables. I found a useful article on Compression written by Gerald Britton. It’s quite comprehensive and covers most of the aspects of Compression.
For my POC, I made use of the SSIS package. I kept 2 data flows, with the same table and file structure, but one with Table Compression enabled and another without Table Compression. Table and file had around 100 columns with only VARCHAR datatype, since the POC was for Staging database, to temporarily hold the raw data from flat files. I’d to also work on the conversion of flat file source output columns, to make it compatible with the destination SQL Server table structure.
The POC was done with various file sizes because we also covered the POC for identifying the optimal value for file size. So we did 2 things in a single POC – Comparison of Compression and finding the optimal file size for the ETL process. The POC was very simple, with 2 data flows. Both had flat files as source and SQL Server table as the destination.
Here is the comparison recorded post POC. I think you would find it useful in deciding if it’s worth implementing Compression in your respective workload.
Findings
- Space-saving: Approx. 87% of space-saving.
- Write execution time: No difference.
- Read execution time: Slight / negligible difference. The plain SELECT statement was executed for comparing the Read execution time. The Compressed table took 10-20 seconds more, which is approx. <2%. As compared to the disk space saved, this slight overhead was acceptable in our workload. However, you need to review thoroughly your case before taking any decision.