Blog Post

Columnstore Indexes – part 71 (“Change Data Capture, Change Tracking & Temporal”)

,

Continuation from the previous 70 parts, the whole series can be found at http://www.nikoport.com/columnstore/.

In the SQL Server world of data modification tracking, there are 2 very important features, which are called Change Data Capture (CDC) and Change Tracking (CT). Both of them allow you to understand the changes done to the monitored table. Notice that they are implemented in the different way, and have different goals.

This blogpost is dedicated to their integration and collaboration with the Columnstore Indexes in SQL Server 2016.

First of all, let’s consider the reasons for not enabling this functionality in SQL Server 2012 & SQL Server 2014:

In SQL Server 2012 we have had only Nonclustered Columnstore Indexes, which were non-updatable, implementing support for CDC & CT have simply made no sense whatsoever.

In SQL Server 2014 the Nonclustered Columnstore Indexes were kept being unupdatable, and so it was still nonsensical to make them support CDC or CT. As for the Clustered Columnstore Indexes in SQL Server 2014, they were implemented as updatable, but unfortunately both functionalities stayed out of the scope for that release.

In SQL Server 2016, we have received support for both of the scenarios, but lets take it slow and 1 by 1 diving into some of the details and tests:

Change Data Capture

The Change Data Capture is the feature that was originally implemented for the good old SQL Server 2008, with the objective of capturing data modification statements, executed against SQL Server tables. There were some important enhancement since the original version, like the addition of SSIS components in SQL Server 2012, but largely the feature has been set in the original release.

In SQL Server 2016 Clustered Columnstore Indexes have received the support of the secondary Nonclustered B-Tree Indexes, including the Unique ones, you can find more information and example on this topic at the blogpost dedicated to the Clustered Columnstore Indexes Architectural Improvements for SQL Server 2016.

From the logical perspective, in order to have support for CDC, we shall need to build a table with Clustered Columnstore Index, and we shall have to add a Unique Nonclustered Index, which we shall indicate CDC to use for collection of data modifications, if we are looking to get the net changes, while for the default implementation it is not required. But how should that work ? Given that Clustered Columnstore is the primary Index, we shall have to read its original data from the Delta-Store or from the compressed Row Group, where in the second case the performance will never be something that can be used in the real world scenario…

My thoughts are that the CDC support for Clustered Columnstore Index requires a unuseful overload for enabling it on a real Fact table, because in Data Warehousing environments doing ETL would allow you to save the modifications directly within ETL process.

For the start, let’s restore a fresh copy of ContosoRetailDW database and apply the usual modifications, such as setting Compatibility Level to 130, setting SA to be the Database Owner and increasing the default sizes for the Data File & Transaction Log:

USE [master]
 
alter database ContosoRetailDW
    set SINGLE_USER WITH ROLLBACK IMMEDIATE;
 
RESTORE DATABASE [ContosoRetailDW] 
    FROM  DISK = N'C:\Install\ContosoRetailDW.bak' WITH  FILE = 1,  
        MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', 
        MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf',  
        NOUNLOAD,  STATS = 5;
 
alter database ContosoRetailDW
    set MULTI_USER;
GO
GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130
GO
ALTER AUTHORIZATION ON DATABASE::ContosoRetailDW to sa;
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB )
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB )
GO

Now we can drop Clustered B-Tree Index and create a Clustered Columnstore Index on our dbo.FactOnlineSales table:

use ContosoRetailDW;
GO
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey];
create clustered columnstore Index PK_FactOnlineSales
on dbo.FactOnlineSales;

To enable Change Data Capture (CDC), we shall need to execute the following script, which will enable it on the database level:

exec sys.sp_cdc_enable_db;

Now, let’s try to enable CDC on our test table (notice that I am using most of the default parameters while explicitly setting NET CHANGES to 1 (track all net changes), because this is what I expect from my CDC-enabled table:

exec sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'FactOnlineSales',
@role_name     = NULL,
@filegroup_name = N'PRIMARY',
@supports_net_changes = 1;

Look at the error message that I have received:

Msg 22939, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 201
The parameter @supports_net_changes is set to 1, but the source table does not have a primary key defined and no alternate unique index has been specified.

Yeah, I am missing a unique index in my table, because I have removed my Clustered B-Tree Index previously.

Let’s add a Unique Nonclustered B-tree Index on the SalesKey column and try to execute the sys.sp_cdc_enable_table again:

create unique nonclustered index UQ_FactOnlineSales
on dbo.FactOnlineSales (OnlineSalesKey)
with (DATA_COMPRESSION = PAGE);

EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'FactOnlineSales',
@role_name     = NULL,
@filegroup_name = N'PRIMARY',
@supports_net_changes = 1,
@index_name = 'UQ_FactOnlineSales';

This time I have received the following message:

Msg 35353, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 215
Change Data Capture cannot be enabled on a table with a clustered columnstore index.  
Consider dropping clustered columnstore index 'PK_FactOnlineSales' on table 'FactOnlineSales'.

This means that Clustered Columnstore Indexes are not supporting CDC and that’s perfectly fine from my personal point of view.

Notice that trying to enable the CDC even without net changes support, will result in the very same error, meaning that this scenario is not being supported in the moment.

What about the Nonclustered Columnstore Indexes ?

For the test, let’s use the different table dbo.FactSales – beginning with the creation of the updatable Nonclustered Columnstore Index:

create nonclustered columnstore index [NCCI_FactSales_SalesKey]
on [dbo].[FactSales] (SalesKey, DateKey, channelKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, UnitCost, UnitPrice, SalesQuantity, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, SalesAmount, ETLLoadID, LoadDate, UpdateDate)

Now, let’s enable the CDC scenario on it, using the same parameters as we have had try on the dbo.FactOnlineSales table:

EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'FactSales',
@role_name     = NULL,
@filegroup_name = N'PRIMARY',
@supports_net_changes = 1

This time it has worked without any error message!

To confirm that it was succesfull, let’s execute the sys.sp_cdc_help_change_data_capture system stored procedure, which will show all tables that are configured for the Change Data Capture:

exec sys.sp_cdc_help_change_data_capture;

sys.sp_cdc_help_change_data_capture

Magnificent, so that means that we have a Rowstore B-Tree table with an updatable Nonclustered Columnstore Index that supports CDC. This makes perfectly sense, because all we actually need to make it work with the traditional Rowstore structure, simply ignoring the secondary Columnstore Index.

But let’s try to insert, update & delete some data from our table, to make it sure, that everything works fine:

delete top (10) from dbo.FactSales;

The deletion operation was a success, and we can confirm it with the execution of the following statement that will count the rows from the cdc.dbo_FactSales_CT table, the default system table that stores modifications from our original dbo.FactSales table:

select count(*) as ModificationsCount
from [cdc].[dbo_FactSales_CT];

In the result, I have received exactly what I have expected – 10 rows.

Now let’s update some data:

update top (20) dbo.FactSales
set UpdateDate = GetDate();

Let’s see if we get 50 rows from the cdc.dbo_FactSales_CT table meaning that we have received 40 modifications, because 20 modifications are stored as 20 deletes and 20 inserts:

select count(*) as ModificationsCount
from [cdc].[dbo_FactSales_CT];

Success! ??

Change Tracking

Change Tracking is a solution that provides opportunity with a lightweight tracking of changes made to a table.

To set it up for your database is very easy – its enough to execute the following command:

alter database ContosoRetailDW
set change_tracking = ON  (CHANGE_RETENTION = 10 DAYS, AUTO_CLEANUP = ON);

Notice that you can also set up all of these properties by using the properties dialog of the database that you are configuring.

Since Change Tracking requires a primary key on the tracked table, we shall need to set up our Clustered Columnstore Table with a Primary Key:

alter table dbo.FactOnlineSales
add constraint PK_FactOlnineSales
primary key Nonclustered (OnlineSalesKey);

Now, let’s enable Change Tracking on our Clustered Columnstore table dbo.FactOnlineSales

alter table dbo.FactOnlineSales
enable change_tracking
with (track_columns_updated = on);

Well, again for Clustered Columnstore Index we have received an error message:

Msg 35353, Level 16, State 1, Line 1
Change Tracking cannot be enabled on a table with a clustered columnstore index.  
Consider dropping clustered columnstore index 'PK_FactOnlineSales' on table 'FactOnlineSales'.

This means that the improvements in tracking for Clustered Columnstore Indexes are not included in SQL Server 2016. As I have written above for the Change Data Capture functionality tests – it does not shock me at all, if Microsoft will never implement it.

Let’s get back to our dbo.FactSales, the one that has a Nonclustered Columnstore Index and check if there is a support for CT (Change Tracking):

alter table dbo.FactSales
enable change_tracking
with (track_columns_updated = on);

As expected, everything functioned perfectly in this case – there were an “Command(s) completed successfully.” message.

To verify if our table is being set up for tracking successfully, let’s read data from the sys.change_tracking_tables DMV:

select object_name(object_id), *
from sys.change_tracking_tables;

sys.change_tracking_tables

Looks great!

Let’s delete some data:

delete top (10)
from dbo.FactSales;

We can verify that the data is being tracked correctly by issuing the following command:

select * 
from ChangeTable (CHANGES FactSales,0) as CtColumnstore

ChangeTable Columnstore

I am very glad to see both of the functionalities getting support for the Nonclustered Columnstore Indexes in SQL Server 2016, this makes Operational Analytics a kind of solution that many people will be able to use without a lot of additional effort.

Temporal

Temporal is the new tracking mechanism that is implemented in SQL Server 2016, and I am considering to be one of the finest addition to the engine of my favourite database. If you are interested in it – considering visiting amazing articles on Temporal written by Itzik Ben-Gan for the SQL Server PRO Magazine.

Let’s consider a new table dbo.FactOnlineSales and enable Temporal on it by issuing the following commands that will drop existing Primary Key, Create a new Clustered Columnstore Index, add the necessary 2 columns for the Temporal:

ALTER TABLE [dbo].[FactInventory] DROP CONSTRAINT [PK_FactInventory_InventoryKey]
GO
create clustered columnstore index PK_FactInventory
on dbo.FactInventory;
GO
alter table dbo.FactInventory add
    temp_start DATETIME2(0) GENERATED ALWAYS AS ROW START NOT NULL
CONSTRAINT DFT_FactInventory_temp_start DEFAULT('19000101'),
    temp_end DATETIME2(0) GENERATED ALWAYS AS ROW END NOT NULL
CONSTRAINT DFT_FactInventory_temp_end DEFAULT('99991231 23:59:59'),
    PERIOD FOR SYSTEM_TIME (temp_start, temp_end);

Now, we can activate the Temporal System Versioning on our table:

ALTER TABLE dbo.FactInventory
    SET ( SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.FactInventoryHistory ) );

Well, I have received the following error message, saying that the primary key is missing in this table:

Msg 13553, Level 16, State 1, Line 12
System versioned temporal table 'ContosoRetailDW.dbo.FactInventory' must have primary key defined.

Let’s add a new primary key (nonclusrtered) to our FactInventory table and try again:

alter table dbo.FactInventory
add constraint PKey_FactInventory
primary key Nonclustered (InventoryKey);
ALTER TABLE dbo.FactInventory
  SET ( SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.FactInventoryHistory ) );

Perfect! This time it has worked like a charm! ??

Let’s take it for a small ride by deleting a couple of rows:

delete top (10)
from dbo.FactInventory;

As before, to verify I will simply check the pre-configured table dbo.FactInventoryHistory for the 10 entries:

select count(*)
from dbo.FactInventoryHistory

This has worked as expected! This looks great!

Let’s check if the Temporal supports Nonclustered Columnstore Indexes as well – we shall need to drop the Clustered Columnstore Index manually and create the Nonclustered Columnstore Index brand new since the conversion from one type to another does not funciton:

DROP INDEX [PK_FactInventory] ON [dbo].[FactInventory];
CREATE NONCLUSTERED COLUMNSTORE INDEX [PK_FactInventory] ON [dbo].[FactInventory] 
(InventoryKey, DateKey, StoreKey, ProductKey, CurrencyKey, OnHandQuantity, OnOrderQuantity, SafetyStockQuantity, UnitCost, DaysInStock, MinDayInStock, MaxDayInStock, Aging, ETLLoadID, LoadDate, UpdateDate, temp_start, temp_end);

This script executed without any error, thus confirming that Nonclustered Columnstore Index is supporting Temporal as well.

Conclusion

Its amazing that we have finally received support for Nonclustered Columnstore Indexes in combinations with the Change Data Capture and Change Tracking. I think that given finite resources the possible decision of not implementing support for the Clustered Columnstore Indexes is very well thought.

The support of the Temporal feature for both Clustered as well as the Nonclustered Columnstore Indexes gives a great alternative, if for some specific reason one needs to use Clustered Columnstore Indexes coupled with Data Tracking features.

to be continued …

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating