Sometimes it is not clear to everyone if and how the different types of indexes influence compression, as well as the impact on the disk space used for the compressed and indexed tables. In the following article, I will perform a small investigation, so that the reader can get an impression on the impact on disk space before and after compression, considering different index types. This comparison will not consider the performance implications of the compression and the corresponding advantages and disadvantages. It is just looking at the space used. On the internet you can find many good articles about the performance implications.
For this test I am using an example Global Master data file with no normalization applied and non-optimal selection of datatypes. The file contains 44 columns with different data types and 279,690 rows. The smallest data type is a varchar(1) and the biggest is a varchar(255). The table has just 1 partition.
This table has a lot more potential for reducing the table size, but in this test, I will just focus on a comparison on Heap tables, Clustered index tables and Clustered Columnstore index tables with compression settings of:
- Page level
- Columnstore
- Archive
- No compression
It is not possible to create a table with no compression and a Columnstore Index, because Columnstore tables and indexes are always stored with columnstore compression. This will give 6 different scenarios:
Type of Compression | Type of Index |
No Compression | Heap |
No Compression | Clustered |
Columnstore | Clustered Columnstore |
Page Level Compression | Heap |
Archive | Clustered Columnstore |
Page Level Compression | Columnstore |
The Indexed key column is the Material number. It is null but doesn’t have any NULLs included. For the Clustered Index I use a FILLFACTOR of 100. To get the used space of the table the stored procedure sp_spaceused will be used. The used output of this SP contains:
Column name | Description |
rows | Number of rows existing in the table. |
data | Total amount of space used by data. |
index_size | Total amount of space used by indexes. |
This is the script to get the output:
--Script to create the table, which give information of the used space the tables.
DROP TABLE #spaceused;
GO
CREATE TABLE #spaceused
(name NVARCHAR(128),
rows CHAR(20),
reserved VARCHAR(18),
data VARCHAR(18),
index_size VARCHAR(18),
unused VARCHAR(18)
);
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[NoComp_ClustInd]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[ClustColumnStInd]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[NoComp_Heap]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[PageLevel_ClustInd]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[Archive_ClustColumnStInd]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[PageLevel_Heap]';
SELECT name,
rows,
data,
index_size,
FROM #spaceused;
This is the table overview before the indexes are created and the compression is used. The column Total Size is the sum of the columns data and Index_size. The columns data, index_size and total Size are in using the unit of measure KB.
name | rows | data | index_size | Total Size |
NoComp_ClustInd | 279690 | 122024 | 16 | 122.040 |
ClustColumnStInd | 279690 | 122024 | 16 | 122.040 |
NoComp_Heap | 279690 | 122024 | 16 | 122.040 |
PageLevel_ClustInd | 279690 | 122024 | 16 | 122.040 |
Archive_ClustColumnStInd | 279690 | 122024 | 16 | 122.040 |
PageLevel_Heap | 279690 | 122024 | 16 | 122.040 |
This is the table after the indexes are created on the table. The Heap table doesn’t get a new index for sure:
name | rows | data | index_size | Total Size |
NoComp_ClustInd | 279690 | 123168 | 480 | 123.648 |
ClustColumnStInd | 279690 | 34016 | 0 | 34.016 |
NoComp_Heap | 279690 | 122024 | 16 | 122.040 |
PageLevel_ClustInd | 279690 | 123160 | 480 | 123.640 |
Archive_ClustColumnStInd | 279690 | 34016 | 0 | 34.016 |
PageLevel_Heap | 279690 | 122024 | 16 | 122.040 |
Now we create Indexes on the analyzed tables.
CREATE CLUSTERED INDEX IX_NoComp_ClustInd ON core.NoComp_ClustInd (Material) WITH( FILLFACTOR = 100) ON [PRIMARY]; CREATE CLUSTERED COLUMNSTORE INDEX IX_ClustColumnStInd ON core.ClustColumnStInd ON [PRIMARY]; CREATE CLUSTERED INDEX IX_PageLevel_ClustInd ON core.PageLevel_ClustInd (Material) WITH( FILLFACTOR = 100) ON [PRIMARY]; CREATE CLUSTERED COLUMNSTORE INDEX IX_Archive_ClustColumnStInd ON core.Archive_ClustColumnStInd ON [PRIMARY];
Now let's use Compression on the analysed tables.
ALTER TABLE [core].PageLevel_ClustInd REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE); ALTER TABLE [core].Archive_ClustColumnStInd REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) ALTER TABLE [core].PageLevel_Heap REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);
This is how the used space look after using the compression on the table:
name | rows | data | index_size | Total Size |
NoComp_ClustInd | 279690 | 123168 | 480 | 123.648 |
ClustColumnStInd | 279690 | 34016 | 0 | 34.016 |
NoComp_Heap | 279690 | 122024 | 16 | 122.040 |
PageLevel_ClustInd | 279690 | 34184 | 192 | 34.376 |
Archive_ClustColumnStInd | 279690 | 22792 | 0 | 22.792 |
PageLevel_Heap | 279690 | 36824 | 16 | 36.840 |
As we can see the overall winner is the Columnstore Index. It uses the least space, even without archive compression. But if you use archive compression, the used space is by far the lowest. This makes it the favorite, if your target is use the lowest amount of used space. The second is the Columnstore Index without compression. In third place we see the clustered index, which is using the Page Level compression, but very close to it is the heap table with Page Level Compression.
Another finding is, that we see differences in the used space between the clustered index table and the heap table. Even without considering the used space for the index we see a difference. Before the compression the heap table is using less space than the table with the clustered index. But after the compression it changes and the table with the clustered index is using slightly less space.
From the results of this example I can see three groups in terms of space usage:
- Columnstore Indexes with Archive Compression (very low space usage)
- Page Level compression with and without an clustered index and Columnstore index without Archive Compression (low to middle space usage)
- Tables without Compression and Indexes (high space usage)
The main finding is that the kind of Index will have influence on the used disk space before and after the compression. The interesting thing is that the table with the column store index is using always the least disk space.
This is just a test that is based on one specific table. It could give different results on other tables. Maybe even an optimization (normalization, correct datatypes,…) of the table could change the results. This comparison is just to get a feeling for the handling of different compressions and clustered indexes.