Continuation from the previous 107 parts, the whole series can be found at http://www.nikoport.com/columnstore/
As we are approaching the RTM (release to manufacturing) of the SQL Server 2017, I decided to focus on the implemented features in the upcoming release, thinking that if they are in the RC (release candidate) than most probably they will be here to stay and some of the next blog posts will be focusing on the features released in the CTP’s of the SQL Server 2017 that I have not mentioned before.
The Release Candidate 1 of the SQL Server 2017 has brought one very significant and extremely long overdue feature support for the Columnstore Indexes. I am talking about the feature that was originally available in the SQL Server 2012 RTM and was taken away from the engine in the CU1 or CU2. Yes, that feature worked for the Nonclustered Columnstore Indexes that were read-only at the time of SQL Server 2012. The feature that in the Banking and Health industries are extremely widely used: Computed Columns.
For the SQL Server 2017 RC1 Microsoft silently added the possibility of using computed columns for the Clustered Columnstore Indexes and this might make a huge difference for some companies when deciding wether migration to SQL Server 2017 is worth it. From my own experience there were some scenarios were clients would get upset by the idea of moving away from the computed columns just because Columnstore Indexes did not support them, so that after asking for a long time about the computed columns I even created Include support for Calculated Columns on Columnstore Indexes.
So when I found out that it was implemented I confess that I jumped for a joy a couple of times, I was so excited about it. My huge thanks goes to Jovan Popovic (Microsoft PM) for pointing out at it, since without a major announcement I did not found out this news on my own.
I had already a VM on Azure running with a RC1 of the SQL Server 2017 and so I decided to advance with some basic tests to discover the supportability of the computed columns and below you will find my tests that I have run on the restored version of the ContosoRetailDW free database:
/* * Queries for testing Columnstore Indexes * by Niko Neugebauer (http://www.nikoport.com) * These queries are to be run on a Contoso BI Database (http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=18279) * * Restores ContosoRetailDW database from the source, which should be placed in C:\Install * */USE [master] alter database ContosoRetailDW set SINGLE_USER WITH ROLLBACK IMMEDIATE; RESTORE DATABASE [ContosoRetailDW] FROM DISK = N'C:\Install\ContosoRetailDW.bak' WITH FILE = 1, MOVE N'ContosoRetailDW2.0' TO N'C:\Data\SQL16\ContosoRetailDW.mdf', MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\SQL16\ContosoRetailDW.ldf', NOUNLOAD, STATS = 1; alter database ContosoRetailDW set MULTI_USER; GO GO ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 140 GO use ContosoRetailDW; ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey]; create clustered columnstore Index PK_FactOnlineSales on dbo.FactOnlineSales
I decided to create a new table, that I will call FactOnlineSales_Computed, containing a new column [EarnedAmount] that will be automatically calculated from the following equation that involves 4 different columns of the very same table: ([SalesAmount] – [ReturnAmount] – [DiscountAmount] – [TotalCost]).
Additionally I decided load the whole FactOnlineSales table into the new table – this means 12.6 million rows and run a couple of tests to see how the engine behaves.
DROP TABLE IF EXISTS [dbo].[FactOnlineSales_Computed]; CREATE TABLE [dbo].[FactOnlineSales_Computed]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL, [EarnedAmount] as ([SalesAmount] - [ReturnAmount] - [DiscountAmount] - [TotalCost]) ) ON [PRIMARY] TRUNCATE TABLE [dbo].[FactOnlineSales_Computed] ; insert into [dbo].[FactOnlineSales_Computed] with (tablockx) (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) select OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate from dbo.FactOnlineSales create clustered index PK_FactOnlineSales_Computed on dbo.FactOnlineSales_Computed (OnlineSalesKey) with (data_compression = page); create clustered columnstore index PK_FactOnlineSales_Computed on dbo.FactOnlineSales_Computed with (drop_existing = on, maxdop = 1);
This script executed without any problem on the SQL Server 2017 RC1, while running the same script on the SQL Server 2016 with Service Pack 1 will result in the following error message:
Msg 35307, Level 16, State 1, Line 34 The statement failed because column 'EarnedAmount' on table 'FactOnlineSales_Computed' is a computed column. Columnstore index cannot include a computed column implicitly or explicitly.
The fact that the we can have computed columns feels my heart with gratitude for finally getting this feature into the engine, this will allow some scenarios that were a total no-go to advance with the Columnstore Indexes, but let us put a couple of queries against the table with the Clustered Columnstore Index in order to find out how it functions and if there are some differences with the traditional RowStore implementation of the computed columns:
-- Take the TOP 10 SalesAmounts for the Earned Amounts above 50.$ select DISTINCT TOP 10 SalesAmount FROM dbo.FactOnlineSales_Computed Where EarnedAmount > 50. ORDER BY SalesAmount DESC;
If you look carefully at the execution plan, you will notice that the Columnstore Index Scan, Compute Scalar and Filter are basically where the magic happens – we scan the original 4 columns, compute the desired value for the [EarnedAmount] and only then filter those values.
Right ? RIIIIHT ?
Let’s set up a different version of the very same table, this time using the Rowstore Indexes and then we shall fire the very same query to compare the result:
DROP TABLE IF EXISTS [dbo].[FactOnlineSales_ComputedRowstore]; CREATE TABLE [dbo].[FactOnlineSales_ComputedRowstore]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL, [EarnedAmount] as ([SalesAmount] - [ReturnAmount] - [DiscountAmount] - [TotalCost]) ) ON [PRIMARY] insert into [dbo].[FactOnlineSales_ComputedRowstore] with (tablockx) (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) select OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate from dbo.FactOnlineSales create clustered index PK_FactOnlineSales_ComputedRowstore on dbo.FactOnlineSales_ComputedRowstore (OnlineSalesKey) with (data_compression = page);
Let’s execute the same query we have run against the Columnstore Indexes table but this time against our Rowstore table:
select DISTINCT TOP 10 SalesAmount FROM dbo.FactOnlineSales_ComputedRowStore Where EarnedAmount > 50. ORDER BY SalesAmount DESC;
Look, this time we have a predicate pushdown to the storage engine with just 4.9 million rows out of the 12.6 million rows coming out of the Clustered Index Scan, plus there are no computer or filter operations. I definitely wish to see the same functionality on the Columnstore Indexes side, if not now – then fixed in the near future.
Lets see now if we run the query getting the average EarnedAmount out of our table – will we get some surprises on the execution side for both types of the table (Rowstore & Columnstore). Will there be a Aggregate Predicate Pushdown for the Columnstore Indexes?
select AVG([EarnedAmount]) FROM dbo.FactOnlineSales_Computed; select AVG([EarnedAmount]) FROM dbo.FactOnlineSales_ComputedRowStore;
This time the execution plans are essentially the same in the form with 2 Compute Scalar operations where the value for the EarnedAmount is calculated and then aggregated. The execution time is of course slower for the Rowstore, since all relevant operations for the Columnstore Indexes runs in the Batch Execution Mode and this is where we get the biggest speed advantage over the Rowstore Indexes.
Of course the Aggregate Predicate Pushdown did not take place, because there is no support for the simple predicate pushdown with the Computed Columns, and so the more complicated the formula is, the more time we spent calculating the final value – but for the scenarios where the performance and filtering are important we have PERSISTED computed columns, and so let us try them out for the Clustered Columnstore Index:
Persisted Computed Columns
Let’s recreate our table, this time making our calculated column [EarnedAmount] as a persisted one.
DROP TABLE IF EXISTS [dbo].[FactOnlineSales_Computed]; CREATE TABLE [dbo].[FactOnlineSales_Computed]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL, [EarnedAmount] as ([SalesAmount] - [ReturnAmount] - [DiscountAmount] - [TotalCost]) PERSISTED )
Now all we need to do is to create the Clustered Columnstore Index:
create clustered columnstore index PK_FactOnlineSales_Computed on dbo.FactOnlineSales_Computed;
The above query results in the following error message:
The statement failed because column 'EarnedAmount' on table 'FactOnlineSales_Computed' is a computed column. Columnstore index cannot include a computed column implicitly or explicitly.
First of all this is a huge pity that the computed columns are not supported right now, but the irritating part here is the error message itself, delivering wrong message to the final user – the non-persisted computed columns are in fact supported.
I understand that storing the data in the Columnstore Index is much more difficult than just calculating the values out of it, but this feature is very needed, especially if the calculations are extremely complex or if we want to use this computed value as a predicate.
Nonclustered Columnstore Index
The HTAP(aka Operational Anyltics) scenarios might have some computed columns, enabling some more complex scenarios that are avoiding views usage because of the existing code, for example – let’s take it to the test by creating a new table, containing NCCI and a computed column:
DROP TABLE IF EXISTS [dbo].[FactOnlineSales_ComputedNCCI]; CREATE TABLE [dbo].[FactOnlineSales_ComputedNCCI]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL, [EarnedAmount] as ([SalesAmount] - [ReturnAmount] - [DiscountAmount] - [TotalCost]) ); create nonclustered columnstore index PK_FactOnlineSales_ComputedNCCI on dbo.FactOnlineSales_ComputedNCCI (EarnedAmount) with ( maxdop = 1); GO
The error message resulting in the message will still be the generic ones, but its easy to interpret – no computed columns are supported for the Nonclustered Columnstore Indexes:
Msg 35307, Level 16, State 1, Line 28 The statement failed because column 'EarnedAmount' on table 'FactOnlineSales_ComputedNCCI' is a computed column. Columnstore index cannot include a computed column implicitly or explicitly.
Notice that we can still create a Nonclustered Columnstore Index on the table with a computed column, simply by not including the column into our index:
create nonclustered columnstore index PK_FactOnlineSales_ComputedNCCI on dbo.FactOnlineSales_ComputedNCCI (SalesAmount) with ( maxdop = 1);
In-Memory Columnstore Index
Let’s see if the In-Memory Columnstore Index supports computed column and I do not hold my breath here, especially since the on-disk Nonclustered Columnstore Index does not support them.
ALTER DATABASE [ContosoRetailDW] ADD FILEGROUP [ContosoRetailDW_Hekaton] CONTAINS MEMORY_OPTIMIZED_DATA GO ALTER DATABASE [ContosoRetailDW] ADD FILE(NAME = ContosoRetailDW_HekatonDir, FILENAME = 'C:\Data\ContosoXtp') TO FILEGROUP [ContosoRetailDW_Hekaton];
CREATE TABLE [dbo].[FactOnlineSales_Hekaton]( [OnlineSalesKey] [int] NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL, [EarnedAmount] as ([SalesAmount] - [ReturnAmount] - [DiscountAmount] - [TotalCost]), Constraint PK_FactOnlineSales_Hekaton PRIMARY KEY NONCLUSTERED HASH ([OnlineSalesKey]) WITH (BUCKET_COUNT = 2000000) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA); alter table dbo.FactOnlineSales_Hekaton add INDEX NCCI_FactOnlineSales_Hekaton CLUSTERED COLUMNSTORE;
As expected, the script above generates an error message, since the computed columns are not supported for the Columnstore Indexes:
Msg 41844, Level 16, State 1, Line 28 Clustered columnstore indexes are not supported on memory optimized tables with computed columns. Msg 1750, Level 16, State 0, Line 28 Could not create constraint or index. See previous errors.
Final Thoughts
First of all – I am HUGELY excited about computed columns getting their way into the disk-based Clustered Columnstore Indexes. HUGELY – I have been missing this feature for way too long!
It is a huge pity that right now we do have support for the computed columns only in their persisted variant, and the only scenario that is supported is the one for the DWH (Data Warehousing) with disk-based Clustered Columnstore Indexes.
I do admit thought that mostly the usage of the computed columns in my experience is related to the DWH, mostly because regular developers have no need for calculations with complex formulas to deliver to the final user, and if there are such computations – they do not change frequently.
Still, I am hoping for a full implementation of the computed columns support for all Columnstore Indexes scenario, as I put in my original connect item: Include support for Calculated Columns on Columnstore Indexes, especially for the persisted computed columns.
Especially the predicate pushdown is the functionality that I am looking forward to … One day hopefully.
to be continued with Columnstore Indexes – part 109 (“Trivial Plans in SQL Server 2017”)