This article is intended to compare the traditional row store indexes with the (not so new) columnstore indexes that have been introduced in SQL Server 2012 and further improved in SQL Server 2014.
A lot of information already exists on the internet with regards to the columnstore internals as detailed in this MSDN article. Therefore in this article, I will just focus on the performance improvements. A very important section of that document, which I would suggest to go through, is the Key Characteristics section. It highlights the can and can't of the columnstore indexes which are very important to understand when taking the decision to actually start using them.
Test Scenarios
I have created 5 tests and tried to keep my testing enviroment free from any other heavy load so as not to contaminate the results. I'll be basing my tests on two identical tables, which I called:
- FactTransaction_ColumnStore - This table will contain only one index, a Clustered columnstore Index. Due to the Clustered columnstore restrictions, no other index can be created on this table.
- FactTransaction_RowStore - This table will contain a Clustered Index, a non-clustered columnstore index and a row store index.
I have attached a script file with the SQL needed to create the tables and Indexes.
I have then populated the tales with 30 million rows each (using sample data from my existing DWH). Through all the tests I specified the MAXDOP Query Hint all statements, so I'll be able to test all queries using the specified number of cores.
Test 1 - Populating both tables
To better match the testing environments, I populated both tables with only their Clustered index enabled. One table is made up of a clustered columnstore index while the other table is made up of only a clustered row index.
IO and Time Statistics
Table 'FactTransaction_ColumnStore'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'FactTransaction'. Scan count 1, logical reads 73462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (30000000 row(s) affected) SQL Server Execution Times: CPU time = 98204 ms, elapsed time = 109927 ms. Table ' FactTransaction_RowStore '. Scan count 0, logical reads 98566047, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'FactTransaction'. Scan count 1, logical reads 73462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (30000000 row(s) affected) SQL Server Execution Times: CPU time = 111375 ms, elapsed time = 129609 ms.
Observations from Test 1
Table Name | Time to populate | Logical Reads |
FacTransaction_ColumnStore | 1.49 mins | ? Not sure why the Logical reads is 0! |
FacTransaction_RowStore | 2.09 mins | 98566047 |
Test 2 - Comparing SEEK
Note that I have specified the FORCESEEK table hint on the RowStore table to force an index seek operation since this is the requirement for this test.
-- Comparing Seek.... SET Statistics IO,TIME ON Select CustomerFK From [dbo].FactTransaction_RowStore WITH(FORCESEEK) Where transactionSK = 4000000 OPTION (MAXDOP 1) Select CustomerFK From [dbo].FactTransaction_ColumnStore Where transactionSK = 4000000 OPTION (MAXDOP 1) SET Statistics IO,TIME OFF
IO and Time Statistics
Table 'FactTransaction_RowStore'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'FactTransaction_ColumnStore'. Scan count 1, logical reads 714, physical reads 0, read-ahead reads 2510, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 83 ms.
Execution Plans
Observations from Test 2
As can be noticed from the above screenshots, an index seek is much faster on the RowStore table than on the columnstore index table. This is mainly due to the fact that SQL Server does not currently support an Index seek on a clustered columnstore index. Looking at the execution plan, an Index scan was actually performed on the columnstore index, resulting in much more logical reads, thus the decrease in performance when compared to a RowStore Index seek.
Table Name | Index Type | Logical Reads | Elapsed Time |
FacTransaction_ColumnStore | Column | 714 | 83 ms |
FacTransaction_RowStore | Row | 3 | 0 ms |
Test 3 - Comparing SCAN
Note that I have specified the FORCESCAN table hint on the RowStore table to force an index scan operation. On the ColumnStore table this hint was not actually needed because the optimzer will do an Index scan anyway.
-- Comparing Scan.... SET Statistics IO,TIME ON Select CustomerFK From [dbo].FactTransaction_RowStore WITH(FORCESCAN) Where transactionSK = 4000000 OPTION (MAXDOP 1) Select CustomerFK From [dbo].FactTransaction_ColumnStore WITH(FORCESCAN) Where transactionSK = 4000000 OPTION (MAXDOP 1) SET Statistics IO,TIME OFF
IO and Time Statistics
Table 'FactTransaction_RowStore'. Scan count 1, logical reads 12704, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 32 ms, elapsed time = 22 ms. Table 'FactTransaction_ColumnStore'. Scan count 1, logical reads 714, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms.
Execution Plan
Observations from Test 3
As can be noticed from the above screenshots, an index scan is faster when performed on a columnstore index than on a row store index. Both the logical reads and the elapsed time indicate that a columnstore index is the preferred method to use when scanning large tables, thus being the better to use on datawarehouse tables.
(In a normal day-to-day scenario, you would not want to scan a large table to retrieve records when you are expecting the result set to be just a couple of rows. An Index seek would obviously be the preferred way to go; however this scenario was created just to demostrate and compare Index scans between Column and Row store indexes.)
Table Name | Index Type | Logical Reads | Elapsed Time |
FacTransaction_ColumnStore | Column | 714 | 2 ms |
FacTransaction_RowStore | Row | 12704 | 22 ms |
Test 4 - Compare Aggregation Queries
Test on RowStore Table using the Clustered Index based on TransactionSK.
SET Statistics IO,TIME ON Select CustomerFK,BrandFK, Count(*) From [dbo].[FactTransaction_RowStore] WITH(INDEX=RowStore_FactTransaction) Group by CustomerFK,BrandFK OPTION (MAXDOP 4)
Test on RowStore table using a RowStore Index on CustomerFK and BrandFK. (Covering Index).
Select CustomerFK,BrandFK, Count(*) From [dbo].[FactTransaction_RowStore] WITH(INDEX=RowStore_CustomerFK_BrandFK) Group by CustomerFK,BrandFK OPTION (MAXDOP 4)
Test on RowStore table using a ColumnStore Index on CustomerFK and BrandFK.(Covering Index).
Select CustomerFK,BrandFK, Count(*) From [dbo].[FactTransaction_RowStore] WITH(INDEX=ColumnStore_CustomerFK_BrandFK) Group by CustomerFK,BrandFK OPTION (MAXDOP 4)
Test on the columnstore table using the Clustered Index.
Select CustomerFK,BrandFK, Count(*) From [dbo].[FactTransaction_ColumnStore] Group by CustomerFK,BrandFK OPTION (MAXDOP 4) SET Statistics IO,TIME OFF
IO and Time Statistics
Test on RowStore Table using the Clustered Index based on TransactionSK.
Table 'FactTransaction_RowStore'. Scan count 5, logical reads 45977, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 9516 ms, elapsed time = 2645 ms.
Test on RowStore table using a RowStore non clustered Index on CustomerFK and BrandFK. (Covering Index).
Table 'FactTransaction_RowStore'. Scan count 5, logical reads 71204, physical reads 0, read-ahead reads 2160, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 5343 ms, elapsed time = 1833 ms.
Test on RowStore table using a ColumnStore non clustered Index on CustomerFK and BrandFK. (Covering Index).
Table 'FactTransaction_RowStore'. Scan count 4, logical reads 785, physical reads 7, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 141 ms, elapsed time = 63 ms.
Test on the columnstore table using the Clustered Index.
Table 'FactTransaction_ColumnStore'. Scan count 4, logical reads 723, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 203 ms, elapsed time = 118 ms.
Execution Plans
Observations from Test 4
Here is where the columnstore index really starts to shine! The performance on both columnstore indexes is much better when compared to the traditional row indexes both in terms of logical reads and elapsed time.
Table Name | Index used | Index Type | Logical Reads | Elapsed Time |
FacTransaction_ColumnStore | ClusteredColumnStore | Column | 717 | 118 |
FacTransaction_RowStore | RowStore_FactTransaction | Row | 45957 | 2645 |
FacTransaction_RowStore | RowStore_CustomerFK_BrandFK | Row | 71220 | 1833 |
FacTransaction_RowStore | ColumnStore_CustomerFK_BrandFK | Column | 782 | 63 |
Test 5 - Compare Updates (Subset of data)
In this test I'll be updating slightly less than 1 million rows i.e. 1/30th of the data in the table.
SET Statistics IO,TIME ON Update [dbo].[FactTransaction_ColumnStore] Set TransactionAmount = 100 Where CustomerFK = 112 OPTION (MAXDOP 1) Update [dbo].[FactTransaction_RowStore] Set TransactionAmount = 100 Where CustomerFK = 112 OPTION (MAXDOP 1) SET Statistics IO,TIME OFF
IO and Time Statistics
Table 'FactTransaction_ColumnStore'. Scan count 2, logical reads 2020, physical reads 0, read-ahead reads 2598, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (913712 row(s) affected) SQL Server Execution Times: CPU time = 27688 ms, elapsed time = 37638 ms. Table 'FactTransaction_RowStore'. Scan count 1, logical reads 2800296, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (913712 row(s) affected) SQL Server Execution Times: CPU time = 6812 ms, elapsed time = 6819 ms.
Execution Plans
Observations from Test 5
In this case, it was much more faster to update the RowStore clustered index table then the ColumnStore clustered index table.
Table Name | Index Type | Logical Reads | Elapsed Time |
FactTransaction_ColumnStore | columnstore | 2020 | 37638 ms |
FactTransaction_RowStore | Row Store | 2800296 | 6819 ms |
Note that the logical reads for the Row Store is higher than the columnstore albeit the columnstore index took longer to update. This is due to the fact that the columnstore index can achieve a higher compression ratio, thus a smaller memory footprint.
Conclusions
The columnstore indexes (both clustered & non clustered) do offer a number of advantages over the row store indexes. However, in my opinion the datawarehouse on which they will be implemented needs to be 'prepared' to support them. One such instance is that non clustered indexes cannot be updated and they have to be disabled for the underlying table to get updated. If the table is huge and there are no partitions, this may become a problem because the whole table index would need to be rebuilt every time, which may be prohibitive if the table is huge! Therefore, in this case a good partition strategy must already be in place to support such indexes.
A very good candidate for these types of indexes is obviously large fact tables on which aggregation will be performed. I also think that columnstore indexes are also a good candidate to be used on Fast Track Data Warehouse Servers since this setup usually involves a lot of Scans. Also, if an SSAS cube is in place, from the above test, it is evident that reading large amounts of data from a columnstore index is faster than reading from its counter part. Hence this can help in improving the cube processing time.