Continuation from the previous 33 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/
Segment Elimination for Deleted Row Groups
After some very interesting and inspiring conversations during SQLBits, I came to an idea of testing a Segment Elimination while working with a fully deleted Segment:
Imagine that you are regularly updating your Columnstore Table and it reaches a certain moment when all data inside a compressed Segment is deleted.
What happens to the Segment elimination process executed by Query Optimiser ?
Will it check on the meta-information level if each of the present rows inside a particular Segment are eliminated and thus reading this segment does not make any sense at all ?
Read this blog post to find it out:
Let us kick of with the a simple setup script, that will create a simple table with just 1 column, which shall be defined as an identity by default.
create table dbo.MaxDataTable( c1 bigint identity(1,1)); — Create a Clustered Columnstore Index: create clustered columnstore index PK_MaxDataTable on dbo.MaxDataTable; — Insert 2 Sequential segments full of default values declare @i as int; declare @max as int; select @max = isnull(max(C1),0) from dbo.MaxDataTable; set @i = 1; begin tran while @i <= 1048576*2 begin insert into dbo.MaxDataTable default values set @i = @i + 1; end; commit; — Invoke Tuple Mover and for all Delta Stores Compression: alter index PK_MaxDataTable on dbo.MaxDataTable Reorganize with (COMPRESS_ALL_ROW_GROUPS = ON);
Now we can check the situation with our Row Groups, and for this purpose I shall use the following script:
-- Clustered Columnstore Indexes Analysis SELECT seg.segment_id, min_data_id, max_data_id, row_count, deleted_rows ,state_description, data_compression_desc --,* FROM sys.column_store_segments AS seg INNER JOIN sys.partitions AS p ON seg.hobt_id = p.hobt_id INNER JOIN sys.column_store_row_groups rg ON p.object_id = rg.object_id and seg.segment_id = rg.row_group_id WHERE p.object_id = object_id('MaxDataTable') and seg.column_id = 1 order by seg.segment_id
As you can see, we have a perfect column clustering on our C1 column, with numbers sequentially occupying their place between 2 compressed Segments.
Now we can test our Table by invoking a test Query for determining if Segment Elimination is working as expected in a situation without any deleted rows.
-- Turn on traces 3605 & 646 dbcc TraceOn(3605,-1); dbcc TraceOn(646,-1); dbcc ErrorLog; go -- A simple query select C4, count(C1) from dbo.MaxDataTable where C1 = 10 group by C4; go -- Turn off both traces dbcc TraceOff(3605,-1); dbcc TraceOff(646,-1); -- Read error log exec sys.xp_readerrorlog
Now we can test our Table by invoking a test Query for determining if Segment Elimination is working as expected in a situation without any deleted rows:
Everything looks fine our second Row Group (id =3) was eliminated from the final result, this is exactly what we have expected.
Lets Delete half of the available rows, all those which do belong to the first of our 2 Segments:
delete from dbo.MaxDataTable where C1 <= 1048576;
We need to determine if anything has changed in our meta-information, and so let us run a query to check on the compressed Segments:
-- Clustered Columnstore Indexes Analysis SELECT seg.segment_id, min_data_id, max_data_id, row_count, deleted_rows ,state_description, data_compression_desc --,* FROM sys.column_store_segments AS seg INNER JOIN sys.partitions AS p ON seg.hobt_id = p.hobt_id INNER JOIN sys.column_store_row_groups rg ON p.object_id = rg.object_id and seg.segment_id = rg.row_group_id WHERE p.object_id = object_id('MaxDataTable') and seg.column_id = 1 order by seg.segment_id
We need to determine if anything has changed in our meta-information, and so let us run a query to check on the compressed Segments:
Quite visible that in our DeletedBitmap for the first of our Segments we have the very same amount of rows that are located in our first Segment – 1048567. This is means that our first Segment has become practically useless, because it does not contain any data at all (all rows are now deleted).
The first question naturally will be what happens with this Segment – and the answer is Nothing. It will stay this way until we shall rebuild our table or partition containing the respective Segment.
I will get back to this topic in the end of this blog post.
For now let us invoke our test query again with the same predicate id (10) again, to see if we can get some Segment Elimination out of Query Optimiser:
-- Turn on traces 3605 & 646 dbcc TraceOn(3605,-1); dbcc TraceOn(646,-1); dbcc ErrorLog; go -- A simple query select count(C1) from dbo.MaxDataTable where C1 = 10 go -- Turn off both traces dbcc TraceOff(3605,-1); dbcc TraceOff(646,-1); -- Read error log exec sys.xp_readerrorlog
Oh well, we are not very lucky, Query Optimiser has ignored only the second segment while reading the first one, where the value 10 should be contained, but only in this case as we have seen, each value of the Segment has already been marked as deleted in Deleted Bitmap.
Final Thoughts
I imagine that it might be considered quite a waste of time to check deleted bitmaps information and if a Segment has become useless, even though I could argue that such checks by default for DataWarehousing queries are not that expensive, given the nature and specificity of the long running queries, especially because in such environments there are not a big number of them.
As an alternative, we need some kind of mechanism which would allow us to delete those Segments without invoking REBUILD functionality, because at the moment we would automatically loose our well prepared Segment Clustering and because it might not make any sense at all.
I would be more than happy to have some kind of hint for Tuple Mover command, such as:
ALTER INDEX REORGANIZE with (REMOVE_ALL_DELETED_SEGMENTS = ON);
which would analyse such Segments and remove them completely.
I hope that this blog post shall receive an update soon …
Update on 28th of July 2014: A Connect item has been filed suggesting some improvements into the maintenance procedures for Columnstore Indexes.
to be continued with Clustered Columnstore Indexes – part 35 (“Trace Flags & Query Optimiser Rules”)