Continuation from the previous 73 parts, the whole series can be found at http://www.nikoport.com/columnstore/.
Row Group merging & cleanup is a very long waited improvement that came out in SQL Server 2016. Once Microsoft has announced this functionality, everyone who has worked with SQL Server 2014 & Clustered Columnstore Indexes has rejoiced – one of the major problems with logical fragmentation because of the deleted data is solved! Amazing!
Just as a reminder – logical fragmentation is the process when we mark obsolete data in the Deleted Bitmap (in Columnstore Indexes there is no direct data removal from the compressed Segments with Delete command and Update command uses Deleted Bitmap as well marking old versions of rows as deleted).
If you want to understand more of this problem – please visit Columnstore Indexes – part 36 (“Maintenance Solutions for Columnstore”) & Columnstore Indexes – part 34 (“Deleted Segments Elimination”) for more information.
From my side, I had a connect item with a suggestion Columnstore Segments Maintenance – Remove & Merge that was closed with information that the feature will be included in SQL Server 2016.
The feature was already to be found in the first public CTP (CTP 2.0) and it is executed online being this way one of the first online-executed features for Columnstore Indexes!
On social media, there are enough people & companies promoting this feature.
Everyone is happy! It’s incredible!
But … what are its functionalities and limitations ?
Let me guide you through some basic tests that I did.
Row Groups Merging
According to the official blogpost from Denzil Ribeiro (SQLCAT Team) explaining that in SQL Server 2016 there are 2 types of Merge processes:
– Self-Merging (this is when Row Group is being recompressed eliminating obsolete row versions that are found in Deleted Bitmap)
– Intergroup Merging (when the sum of active rows between multiple Row Groups is less then 1048576, then their informations can be merged into a new single Row Group)
From the point of view of requirements there are just a few basic things for Row Group to qualify:
– Being a compressed Row Group (Delta-Stores & Tail Row Groups are not qualified)
– A Row Group has to have at least 10% of its data being marked as obsolete in Deleted Bitmap (for InMemory this threshold is set on 90% actually).
– A Row Group should not being trimmed because of the dictionary pressure.
Cleanup
As already pointed in Columnstore Indexes – part 34 (“Deleted Segments Elimination”), there are times when we delete the whole Row Group (100% of data) and in this case, there should not be any merging, but a simple and efficient removal of this Row Group from the Columnstore Index.
The Catch
In SQL Server 2016, we simply can not talk about 1 scenario of Columnstore Indexes. There are 2 types of updatable Columnstore Indexes – Clustered & Nonclustered, plus for Clustered Columnstore Indexes we have Disk-Based & InMemory solutions and even though Disk-Based Clustered Columnstore is a single term – there is a huge difference because of the connection with secondary b-tree because of the Mapping Index existence. (
Totally we have at least 4 distinct scenarios for Columnstore Indexes:
– Disk-Based Clustered Columnstore
– Disk-Based Clustered Columnstore with secondary b-tree indexes
– Nonclustered Columnstore Index
– InMemory Clustered Columnstore Index
Even though people might say that underlying technology is the same Columnstore, let me point that architectural differences make huge impact and saying that InMemory Clustered Columnstore and disk-based Clustered Columnstore Index with b-tree indexes are the same – is plainly wrong and non-acceptable for anyone interested in technology.
The game
Let’s start this test with a fresh restored copy of my favourite base-test database – the free ContosoRetailDW that comes from Microsoft that I use the backup copied into C:\Install and the data files are placed into C:\Data:
alter database ContosoRetailDW set SINGLE_USER WITH ROLLBACK IMMEDIATE; RESTORE DATABASE [ContosoRetailDW] FROM DISK = N'C:\Install\ContosoRetailDW.bak' WITH FILE = 1, MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf', NOUNLOAD, STATS = 5; alter database ContosoRetailDW set MULTI_USER; GO GO ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130 GO ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB ) GO ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB ) GO
Disk-based Clustered Columnstore
Let’s start with the Data Warehouse scenario and create a copy of FactOnlineSales table with Clustered Columnstore Index and nothing else.
In the following code I am creating a FactOnlineSales_CCI table with a single Clustered Columnstore and nothing else, loading the data into the Columnstore Index in the way that I shall get the best possible Segment Elimination on the OnlineSalesKey column:
CREATE TABLE [dbo].[FactOnlineSales_CCI]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL ) ON [PRIMARY] insert into [dbo].[FactOnlineSales_CCI] with (tablockx) (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) select OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate from dbo.FactOnlineSales; create clustered index PK_FactOnlineSales_CCI on dbo.FactOnlineSales_CCI (OnlineSalesKey) with (data_compression = page); create clustered columnstore index PK_FactOnlineSales_CCI on dbo.FactOnlineSales_CCI with (drop_existing = on, maxdop = 1); GO
I will be using CISL (open source Columnstore Indexes Scripts Library) for observing this table behaviour (if you are interested learning how to use it, visit How to use CISL – the Columnstore Indexes Scripts Library):
exec cstore_GetRowGroups @tableName = 'FactOnlineSales_CCI' exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_CCI';
As you can see – there are 13 Row Groups with 12.627.608 active rows and no deleted ones.
Let’s delete 200.000 rows from the very first Row Group:
;with delCTE as ( select top (200000) OnlineSalesKey from dbo.FactOnlineSales_CCI del order by OnlineSalesKey ) delete from delCTE;
For making sure this is correct, let’s re-invoke the CISL statements:
exec cstore_GetRowGroups @tableName = 'FactOnlineSales_CCI' exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_CCI';
As you can see on the pictures above, we have deleted 200.000 rows from the very first Row Group with ID = 0.
It’s time to see if our Tuple Mover functions as descried in the manual, let’s use ALTER INDEX … REORGANIZE:
alter index PK_FactOnlineSales_CCI on dbo.FactOnlineSales_CCI reorganize;
This time, unlike in SQL Server 2014, it will take a good number seconds (31 to be more precise on my 2-core VM) to execute this statement and besides looking for the closed Delta-Stores, this process will create a new compressed Row Group with ID = 13 with 848.576 rows (original 1048576 minus the deleted 200.000), and then will remove the deleted Row Group.
Let’s monitor this process:
exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_CCI';
Now in SQL Server 2016 we can see the Tombstone corresponding to the original Row Group.
A couple of seconds later the Tombstone is gone and we have our 13 Row Groups where the newest one (ID=13) is a simply copy of the first one (ID=0) without the deleted rows – with the active 848.567 rows:
Also, notice that because no Intergoup Merging took place, our Fragmentation levels are still on 0, meaning that all Segments for the OnlineSalesKey column are 100% aligned:
exec dbo.cstore_GetAlignment @columnName = 'OnlineSalesKey';
Let’s get back to the 10% of the deleted information and let’s delete exactly 10% of data from the nominally first Row Group (right now it is the one with the ID = 13) – and remember right now it has 848.567 rows:
;with delCTE as ( select top (84856) OnlineSalesKey from dbo.FactOnlineSales_CCI del order by OnlineSalesKey ) delete from delCTE;
Invoking Tuple Mover at this point will not bring us anything, since we have not deleted exactly the 10% …
alter index PK_FactOnlineSales_CCI on dbo.FactOnlineSales_CCI reorganize;
Let’s delete one more row and re-invoke the Tuple Mover … It does not help …
Deleting more hundreds and thousands rows does not help – I have reached 102.400 (the magical number used for triggering compressed Row Groups loading for Bulk Load API), but with no success:
Advancing further I have finally “upgraded” to 105.000 rows:
This time it has finally worked:
alter index PK_FactOnlineSales_CCI on dbo.FactOnlineSales_CCI reorganize;
And now you can see my observations with CISL cstore_GetRowGroupDetails after the Tuple Mover finally processed my Columnstore Index:
After observing the Tombstone generated by the Tuple Mover and then finally its removal I have arrived to the following state:
exec cstore_GetRowGroups @tableName = 'FactOnlineSales_CCI' exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_CCI';
As you can see, the last Row Group in the sequence (ID = 14) has 743.576 rows, exactly without the 105.000 rows that we have previously deleted and 305.000 rows later after the original 1048576 rows. This is still well-aligned and logically the first Row Group in the sequence ordered by OnlineSalesKey.
The number that was used as the tipping out is not the 10% of the active rows, but the 10% of the maximum size of the Row Group, meaning that we need to have at least 104857 rows deleted in order to trigger self-merging process by invoking tuple mover.
That’s fine for the basic test, but what about intergroup merging ?
We have a very close example – take a look at the last image and you shall notice that there are 2 row groups (ID=12 & ID=14) that in the sum are very close to make together in sum the maximum number of rows currently allowed for a Row Group: all we need to do is to delete 160.670 rows from one or both of them.
You might wonder if actually we need to remove less rows and the Row Group with ID = 6 will take part in this process.
According to the requirements described in the being of this article the Row Groups that have dictionary pressure might not take place in the merge process. In SQL Server 2016 it is extremely easy to determine the trim reasons, its enough to issue the following query (a special function is coming up soon for CISL):
select --object_name(phst.object_id) as TableName, phst.index_id, partition_number, generation, state, state_desc, total_rows, deleted_rows, size_in_bytes, trim_reason, trim_reason_desc, phst.transition_to_compressed_state, phst.transition_to_compressed_state_desc, phst.has_vertipaq_optimization from sys.dm_db_column_store_row_group_physical_stats phst inner join sys.indexes ind on phst.object_id = ind.object_id and phst.index_id = ind.index_id order by phst.object_id, phst.partition_number, phst.row_group_id;
You can learn a lot of things from this query, such as the column “generation” explains you which Row Group is the first one in reality and the “transition_to_compressed_state” explains the way how this row group was generated, plus you have a timestamp of the compression (not included in the query above), etc. Great stuff!
Let’s delete those 106.700 rows and see which row groups (if any) gets merged:
;with delCTE as ( select top (160670) OnlineSalesKey from dbo.FactOnlineSales_CCI del order by OnlineSalesKey ) delete from delCTE;
The rows were deleted, now let’s call the Tuple Mover:
alter index PK_FactOnlineSales_CCI on dbo.FactOnlineSales_CCI reorganize;
Using the good old stored procedure cstore_GetRowGroupsDetails from CISL, let’s observe the results:
exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_CCI'
As expected the Tuple Mover has merged the Row Groups with ID=12 & ID=14, ignoring the Row Group with ID=6, because of the dictionary pressure. Now we have a Tombstone that within a couple of seconds will be removed by the background process.
Voilá – our newest Row Group with ID = 15 and 1048567 rows is ready.
The integroup merging can work on more then 2 Row Groups and so far I have seen in this scenario with a simple Clustered Columnstore Index – it works really fine.
Before advancing to our scenarios, there is one more thing we need to test – removal of the fully deleted Row Groups – so called Cleanup.
For testing it, let’s delete 2 million rows from our Clustered Columnstore Index – they will be spread over 3 different Row Groups where 1 of them will be deleted completely:
;with delCTE as ( select top (2000000) OnlineSalesKey from dbo.FactOnlineSales_CCI del order by OnlineSalesKey ) delete from delCTE;
Let’s start the Tuple Mover:
alter index PK_FactOnlineSales_CCI on dbo.FactOnlineSales_CCI reorganize;
You can see the results by invoking the following command:
exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_CCI'
The Row Groups with IDs 1,2,15 have become Tombstones and now the newly converted version without deleted rows for the Row Groups with IDs 2 & 15 are already in place:
There is nothing else to add here at this point, every declared functionality of Row Group merging works fine for the simple Clustered Columnstore Index.
Disk-based Clustered Columnstore with secondary b-tree indexes
UPDATE ON 15.04.2016 – In the Release Candidates for SQL Server 2016, this situation has been already solved, and so it is now possible to have Row Group merging on the Clustered Columnstore with secondary b-tree indexes.
As I have written in Columnstore Indexes – part 66 (“More Clustered Columnstore Improvements in SQL Server 2016”), in SQL Server 2016 our Data Warehouse Clustered Columnstore table can receive secondary b-tree indexes for enabling referral integrity (foreign keys), unique keys (as well as the primary ones), and of course the short-range and point lookups.
Let us create a table with a Clustered Columnstore Index and a secondary b-tree index to see if the same Merge process works fine with them:
CREATE TABLE [dbo].[FactOnlineSales_CCI_Plus_Indexes]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL ) ON [PRIMARY] insert into [dbo].[FactOnlineSales_CCI_Plus_Indexes] with (tablockx) (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) select OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate from dbo.FactOnlineSales create clustered index PK_FactOnlineSales_CCI_Plus_Indexes on dbo.FactOnlineSales_CCI_Plus_Indexes (OnlineSalesKey) with (data_compression = page); create clustered columnstore index PK_FactOnlineSales_CCI_Plus_Indexes on dbo.FactOnlineSales_CCI_Plus_Indexes with (drop_existing = on, maxdop = 1); GO create nonclustered index NCIX_FactOnlineSales_CCI_Plus_Indexes_DateKey on dbo.FactOnlineSales_CCI_Plus_Indexes (DateKey) include (SalesQuantity) with (data_compression = page);
To monitor the changes I will use the 2 usual functions from the CISL library:
exec cstore_GetRowGroups @tableName = 'FactOnlineSales_CCI_Plus_Indexes' exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_CCI_Plus_Indexes'
On the outside this is the very same table with Clustered Columnstore Index 12.627.608 rows and 13 Row Groups and Row Group with ID = 6 suffering from the same dictionary pressure … Everything is fine here …
It’s time to delete 200.000 rows and invoke the Tuple Mover:
;with delCTE as ( select top (200000) OnlineSalesKey from dbo.FactOnlineSales_CCI_Plus_Indexes del order by OnlineSalesKey ) delete from delCTE;
We have deleted 200.000 rows from the very first Row Group, because I have applied Segment Clustering on the OnlineSalesKey column like for the basic Clustered Columnstore previously:
Tuple Mover invocation time:
alter index PK_FactOnlineSales_CCI_Plus_Indexes on dbo.FactOnlineSales_CCI_Plus_Indexes reorganize;
Wow!
It took not even a second to execute it!
But wait a second second … Did it do anything at all ?
Nope! Nothing has changed.
Even though we have deleted almost 20% of the content of the Row Group, Tuple Mover did not advanced a bit!
Why?
Let me make a guess that it has to do with the Mapping Index structure, that stores the connections between Clustered Columnstore Index and the b-tree indexes.
The Mapping Index stores the RowGroup ID and the position within the Row Group for each of the rows inside the Columnstore Index (Row Group Id:position), so that there is connection between b-tree index and the columnstore one. If you are looking more depth on this topic, then you can start with Columnstore Indexes – part 55 (“New Architecture Elements in SQL Server 2016”).
Moving Row Group means that there will be a lot of operations (over 800K in this case) over the Mapping Index, which is definitely not cheap in the terms of the CPU (scanning over billions of entries and updating them online is not cheap, this lab test with 1 row group and 12.6 Million rows is nothing).
But hey – I think its not too expensive!
I suspect that this is simply a question of implementations priorities.
For Data Warehousing there is no question that people need to use b-tree indexes and there will be no real understanding for the scenarios such as this one, where the real behind-the-scenes difference is known to not too many it professionals.
Let’s see if deleting the whole segment will allow the Tuple Mover to do at least the cleanup task:
;with delCTE as ( select top (850000) OnlineSalesKey from dbo.FactOnlineSales_CCI_Plus_Indexes del order by OnlineSalesKey ) delete from delCTE;
alter index PK_FactOnlineSales_CCI_Plus_Indexes on dbo.FactOnlineSales_CCI_Plus_Indexes reorganize;
Nothing.
Like really nothing happens here.
All the same Row Groups – with 1 completely deleted Row Group at the very same location as before.
This means that the newest kid on the block (nonclustered b-tree indexes for Clustered Columnstore Index) prohibit all the juicy improvements for the Row Groups maintenance tasks.
No intergroup merge or clean here to be found.
I am definitely creating a connect item on this one.
I will be fine with an additional hint in the ALTER INDEX … REORGANIZE statement, like in the case of the open Delta-Stores, but I am not happy to see SQL Server 2016 staying on the level of 2014 when a new feature arrives.
Updatable Nonclustered Columnstore Index
In SQL Server 2016 we have 2 types of Operational Analytics feature: disk-based (updatable nonclustered columnstore indexes) & in-memory (updatable clustered columnstore on in-memory tables).
Nonclustered Columnstore Index stayed closed for any updates for 2 versions (SQL Server 2012 & SQL Server 2014) but in SQL Server 2016 we can finally have it updatable. Why is this so important ?
Because we can choose the columns for the analytical queries instead of putting every single one like in the case with the Clustered Index.
Without wasting your time any further, let us create a new table with a clustered index and a nonclustered columnstore index as well, and let’s try to do the same operations as we did for the simple clustered columnstore index as well as for the clustered columnstore index with secondary b-tree index:
CREATE TABLE [dbo].[FactOnlineSales_NCCI]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL ) ON [PRIMARY] insert into [dbo].[FactOnlineSales_NCCI] with (tablockx) (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) select OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate from dbo.FactOnlineSales; create clustered index PK_FactOnlineSales_NCCI on dbo.FactOnlineSales_NCCI (OnlineSalesKey) with (data_compression = page); create nonclustered index Index_FactOnlineSales_NCCI on dbo.FactOnlineSales_NCCI (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) with (--drop_existing = on, data_compression = page, maxdop = 1); create nonclustered columnstore index Index_FactOnlineSales_NCCI on dbo.FactOnlineSales_NCCI (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) with (drop_existing = on, maxdop = 1); GO
With CISL functions cstore_GetRowGroups & cstore_GetRowGroupsDetails we can take a look at the overall situation with the Row Groups as well as have them all listed:
exec dbo.cstore_GetRowGroups @tableName = 'FactOnlineSales_NCCI'; exec dbo.cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_NCCI';
Looks so similar to the Clustered Columnstore Index scenarios that one would wonder if its really the same one.
Don’t forget that this is just the list of the compressed Row Groups which are the same, but other internal structures are different – for Nonclustered Columnstore we have additional Deleted Buffers – for more information please read Columnstore Indexes – part 55 (“New Architecture Elements in SQL Server 2016”) and Columnstore Indexes – part 69 (“Operational Analytics – Disk Based”).
Now let’s delete 200.000 rows from our rowstore table with an updatable nonclustered columnstore index:
;with delCTE as ( select top (200000) OnlineSalesKey from dbo.FactOnlineSales_NCCI del order by OnlineSalesKey ) delete from delCTE;
Let’s take a look at the effect of our operation:
exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_NCCI';
Surprise! There are no deleted rows here in the Row Groups to be found.
This happens because this information at the moment is stored within Deleted Buffer.
At the moment unreleased version of the function cstore_GetRowGroups will do this work for you and reveal that rows are removed (this information can be found in the new sys.internal_partitions DMV):
exec cstore_GetRowGroups @tableName = 'FactOnlineSales_NCCI';
Here is a query that will display more detailed information about the internal partitions of the dbo.FactOnlineSales_NCCI table:
select object_name(part.object_id) as TableName, ind.name as IndexName, part.index_id, part.hobt_id, part.internal_object_type, part.internal_object_type_desc, part.row_group_id, part.rows, part.data_compression, part.data_compression_desc from sys.internal_partitions part left outer join sys.indexes ind on part.object_id = ind.object_id and part.index_id = ind.index_id where part.object_id = object_id('dbo.FactOnlineSales_NCCI')
As I have shown in Columnstore Indexes – part 69 (“Operational Analytics – Disk Based”), the first invocation of the Tuple Mover will synchronise Deleted Bitmap with the Deleted Buffer, and so let’s see if something else will happen if we execute it now:
alter index Index_FactOnlineSales_NCCI on dbo.FactOnlineSales_NCCI reorganize;
Cool, we have synched our deleted rows, but the Row Group is intact!
Being a constant pain in the back, I try to repeat if it did not bring the desired effect on the first attempt:
alter index Index_FactOnlineSales_NCCI on dbo.FactOnlineSales_NCCI reorganize;
I actually had to execute it twice to make it work the way I wanted. I assume that this is another bug and will be corrected before RTM.
Let’s take another look at our Row Groups:
exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_NCCI';
Now that is exactly what I was looking for ! The Row Group with ID = 0 turned into a Row Group with ID = 13 and with just 848576 rows as expected!
Needed to execute a couple of times, but that’s actually a good thing – because it will take off some of the direct impact on the OLTP workloads (this is an OLTP reporting scenario, do not forget that).
I just wish to have a possibility to do it at once. For some workloads getting this whole process in one step during maintenance window would very nice, because the decision to repeat Tuple Mover invocation might get a little bit tricky – there will be a tipping point for the cost.
The next thing to do is to check on the Cleanup process and for that we need to delete the next 850.000 rows:
;with delCTE as ( select top (800000) OnlineSalesKey from dbo.FactOnlineSales_NCCI del order by OnlineSalesKey ) delete from delCTE;
And after a couple of invocations of the Tuple Mover
alter index Index_FactOnlineSales_NCCI on dbo.FactOnlineSales_NCCI reorganize;
This is what you should see when using cstore_GetRowGroupsDetails function:
, which after a couple of minutes will turn itself into a more clean version without any Tombstones:
Its great to see that Nonclustered Columnstore do have this important improvement for cleaning & merging Row Groups, rest to see some of the bug fixes and the possibility to execute whole process at once.
InMemory Clustered Columnstore Index
The engine of the future is the InMemory and every Database vendor has moved into this space. With SQL Server 2014 we have received the InMemory OLTP (also known as Hekaton) and now in SQL Server 2016 we have received what so many people was looking forward to – the conjunction of InMemory tables (InMemory OLTP) and Columnstore Technology by introducing InMemory Clustered Columnstore Index. For more information on the internals please visit Columnstore Indexes – part 72 (“InMemory Operational Analytics”).
I will be going into more details of InMemory Operational Analytics (InMemory Columnstore) in the future blog posts, but for now I will focus on the Merge process.
Let’s create a copy of InMemory table with Clustered Columnstore (notice that you can only create a Clustered Columnstore in SQL Server 2016 and it should be created in the table definition), loading all the data from the FactOnlineSales table into it.
Notice that you will need some significant amount of available memory & work memory for this exercise. You might need to edit your Resource Governor settings and have at least 12-16 GB RAM to be able to run this script:
CREATE TABLE [dbo].[FactOnlineSales_Hekaton]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL, Constraint PK_FactOnlineSales_Hekaton PRIMARY KEY NONCLUSTERED ([OnlineSalesKey]), INDEX NCCI_FactOnlineSales_Hekaton CLUSTERED COLUMNSTORE --(OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA); -- Insert all 12.6 Million Rows from dbo.FactOnlineSales insert into dbo.FactOnlineSales_Hekaton select * from dbo.FactOnlineSales order by OnlineSalesKey desc
The next step for InMemory Operational Analytics is to invoke the compression process (Tuple Mover?) and it is done in 2 steps:
1. Getting the Object_ID for the InMemory table
2. Invoking the sys.sp_memory_optimized_cs_migration stored procedure with the object id that was selected in the first step:
-- Select Object_ID for the InMemory OLTP Table select object_id('dbo.FactOnlineSales_Hekaton'); -- Substitute the object_id with your own from the previous statement exec sys.sp_memory_optimized_cs_migration @object_id = 286624064
Let’s take a look at the Row Groups that were generated by the loading & conversion process:
exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_Hekaton';
The Row Group with the ID = -1 is nothing else then the Tail Row Group which is a endless Delta-Store (landing zone for all new information). The rest of Columnstore Index structure greatly resembles the Columnstore Indexes for the disk-based solutions, but strangely there is no dictionary pressure since xVelocity engine compressed data in a different way. There will be further investigation on this topic, and for the moment I am focusing on the InMemory Columnstore Row Groups merging capabilities.
Let’s kick off with deletion of 800.000 rows from the InMemory table:
;with delCTE as ( select top (800000) OnlineSalesKey from dbo.FactOnlineSales_Hekaton del order by OnlineSalesKey ) delete from delCTE;
It’s time to verify our Row Groups:
exec cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_Hekaton';
Well, my deletes were spread between 2 different Row Groups with 1 Row Group’s data being 100% deleted. And after a couple of seconds the fully deleted Row Group is removed from the Columnstore Structure as you can see on the following screenshot:
Let’s see what can be done with the Tuple Mover:
alter index NCCI_FactOnlineSales_Hekaton on dbo.FactOnlineSales_Hekaton reorganize;
Msg 10794, Level 16, State 14, Line 3
The operation ‘ALTER INDEX’ is not supported with memory optimized tables.
Nothing, because right now we can’t invoke Alter Index … Reorganize on InMemory tables.
That is very disappointing, but what about REBUILD:
alter table dbo.FactOnlineSales_Hekaton rebuild;
Msg 10794, Level 16, State 133, Line 4
The operation ‘ALTER TABLE REBUILD’ is not supported with memory optimized tables.
That’s a real bummer – we can’t rebuild InMemory table either.
This also means that we won’t be able any Segment Clustering on the InMemory tables in SQL Server 2016.
As much as I see InMemory Columnstore – it will be an alpha technology on the level of SQL Server 2012 when compared to the mature disk-based offerings in SQL Server 2016.
Let’s try to delete more data to see what happens:
;with delCTE as ( select top (800000) OnlineSalesKey from dbo.FactOnlineSales_Hekaton del order by OnlineSalesKey ) delete from delCTE;
Now the Row Group with ID = 12 has gone, because we deleted it completely … I get it, but there is another threshold for InMemory Columnstore tables and it is based on the 90% of the data being deleted.
Let’s try to delete the last 950.000 rows to see if we manage to affect the Row Group with ID = 1:
;with delCTE as ( select top (950000) OnlineSalesKey from dbo.FactOnlineSales_Hekaton del order by OnlineSalesKey desc ) delete from delCTE;
With the help from cstore_GetRowGroupsDetails function from CISL, let’s see what is the effect of this operation:
exec dbo.cstore_GetRowGroupsDetails @tableName = 'FactOnlineSales_Hekaton';
Notice that our Row Group with ID = 1 has disappeared and that right now the Tail Row Group has grown and it has 98.756 rows! This happened because when deleting over 90% of the Row Group data, then the InMemory Row Group will get removed and the rest of the active data will be moved into the Tail Row Group – I have explained this in Columnstore Indexes – part 72 (“InMemory Operational Analytics”).
This confirms that Self-Merge works for InMemory Clustered Columnstore Indexes but with different conditions (90% of deleted data instead of just 10% for disk-based ones) and the action of moving active data into the Tail Row Group is absolutely brilliant – the probabilities are extremely high that this active data will be modified in the next moments, and Columnstore Index is not the right place for the Hot Data that is being constantly modified.
From my point of view, InMemory Clustered Columnstore are still very immature growing, with no Reorganize or Rebuild processes, with no possibility to modify or to truncate table – and most importantly the good old technics such as Segment Clustering do not work at all in this scenario.
Even though much later I have realised that running the stored procedure “sys.sp_memory_optimized_cs_migration” does a similar job to what ALTER INDEX REORGANIZE does for the disk-based tables.
Final Thoughts
Its great to see support for Merging Groups and their Cleanup but there must be much work done in this direction in order to claim that merging & cleaning is the process that works easily & swiftly for all types of Columnstore Indexes.
As for the moment, here is the table with all current Alter Index … Reorganize compatibilities:
Technology | Self Merge | Intergroup Merge | Cleanup |
---|---|---|---|
Smple Disk-Based Clustered Columnstore | yes | yes | yes |
Disk-Based Clustered Columnstore with secondary b-tree indexes | yes | yes | yes |
Nonclustered Columnstore Index | yes | yes*1 | yes |
InMemory Clustered Columnstore Index | yes*2 | no | yes |
*1 – you will need to execute Alter Index … Reorganize at least twice, since the rows are moved from Deleted Buffer into the Deleted Bitmap on the first invocation
*2 – the data resting after 90% of the deleted rows inside a Row Group is moved into the Tail Row Group and can be recompressed later. This way it will work as a 2 phase Self-Merge.
Use sys.sp_memory_optimized_cs_migration for invoking the Merge process for the InMemory tables.
Let us hope that these situations will be improved before the final release of SQL Server 2016.
to be continued with Columnstore Indexes – part 75 (“Stretch DB & Columnstore”)