Blog Post

Columnstore Indexes – part 81 (“Adding Columnstore Index to InMemory Tables”)

,

Continuation from the previous 80 parts, the whole series can be found at http://www.nikoport.com/columnstore/.

Microsoft is doing an amazing job with the every single point release towards SQL Server 2016 RTM. For almost a year, every single month they have released CTPs (Community Technology Previews), improving the current capabilities of the SQL Server and allowing the clients to accompany the product growth and perfecting the upcoming release.

I am writing these praising words, because I have accompanied the path of evolution of the InMemory solution and because finally I am really happy about it.

I have not written a lot about InMemory Operational Analytics previously, mostly because I have been waiting for the RC & RTM versions, since the previous CTP releases have had a number of serious limitations – but hey, what would you expect from the work-in-progress releases.

For the tests of the newest improvements, I have taken the RC 1 (Release Candidate 1) of SQL Server 2016 and the free ContosoRetailDW database.

USE master;
 
alter database ContosoRetailDW
    set SINGLE_USER WITH ROLLBACK IMMEDIATE;
 
RESTORE DATABASE [ContosoRetailDW] 
    FROM  DISK = N'C:\Install\ContosoRetailDW.bak' WITH  FILE = 1,  
        MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', 
        MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf',  
        NOUNLOAD,  STATS = 5;
 
alter database ContosoRetailDW
    set MULTI_USER;
GO
Use ContosoRetailDW;
GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB )
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB )
GO

One of the key features that were added in the first Release Candidate of SQL Server 2016 is the capability to add and remove the columnstore index after creating the InMemory table.

Quite normal feature by itself, which would anyone expect from a solid release – that’s for sure, but for anyone working & experimenting with previous release for sure was anxious to see the previous limits of the test version removed.

As we have learned in SQL Server 2012 with the Nonclustered Columnstore Indexes, the capability of adding and dropping the columnstore indexes can be very important and I am confident that there will be enough applications going through the upgrade that will still use the old technique described in Columnstore Indexes – part 79 (“Loading Data into Non-Updatable Nonclustered Columnstore”).

Modifying InMemory table comes with a big costs – we have to have at least the same amount of free memory available, as the space, which is occupied by the original table plus all its modifications. This means that under the hood the InMemory table data is being copied over to a new structure. What would that mean for adding a Columnstore Index ? How fast would it be?

Let’s start the test by adding the Memory Optimised File Group to our test database:

ALTER DATABASE [ContosoRetailDW] 
ADD FILEGROUP [ContosoRetailDW_Hekaton] 
CONTAINS MEMORY_OPTIMIZED_DATA
GO
ALTER DATABASE [ContosoRetailDW]
ADD FILE(NAME = ContosoRetailDW_HekatonDir, 
FILENAME = 'C:\Data\xtp') 
TO FILEGROUP [ContosoRetailDW_Hekaton];

For the further tests, I will use 2, 4 & 6 Million Rows from the dbo.FactOnlineSales table into the test InMemory tables, measuring then the time for adding other Hash & Columnstore indexes. Notice that in the sample below I have prepared the buckets for the Primary Key Nonclustered Index for 2 million rows, but the structure I have used for the tests, has always accompanied the executed scenario:

CREATE TABLE [dbo].[FactOnlineSales_Hekaton](
[OnlineSalesKey] [int] NOT NULL,
[DateKey] [datetime] NOT NULL,
[StoreKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [int] NULL,
[SalesQuantity] [int] NOT NULL,
[SalesAmount] [money] NOT NULL,
[ReturnQuantity] [int] NOT NULL,
[ReturnAmount] [money] NULL,
[DiscountQuantity] [int] NULL,
[DiscountAmount] [money] NULL,
[TotalCost] [money] NOT NULL,
[UnitCost] [money] NULL,
[UnitPrice] [money] NULL,
[ETLLoadID] [int] NULL,
[LoadDate] [datetime] NULL,
[UpdateDate] [datetime] NULL,
Constraint PK_FactOnlineSales_Hekaton PRIMARY KEY NONCLUSTERED HASH ([OnlineSalesKey])
WITH (BUCKET_COUNT = 2000000)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

Now, let’s load 2 million rows into this table:

insert into dbo.FactOnlineSales_Hekaton
select top 2000000 *
from dbo.FactOnlineSales;

Before any advances with adding a new index to the table, I have executed checkpoint operation, forcing the the clearance of the transaction log:

Checkpoint;

I have rebuilt the table completely after each of the tests, adding a Clustered Columnstore in measuring the time for the operation.

set statistics time, io on
alter table dbo.FactOnlineSales_Hekaton
add INDEX NCCI_FactOnlineSales_Hekaton
CLUSTERED COLUMNSTORE;

Now, after rebuilding the whole table right from the beginning, I have added a Nonclustered Hash index, instead of the columnstore one:

set statistics time, io on
alter table dbo.FactOnlineSales_Hekaton
add INDEX NCCI_FactOnlineSales_Hekaton NONCLUSTERED HASH 
(OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) 
WITH (BUCKET_COUNT = 2000000);

I have started by measuring the impact of log writes in each of the scenarios with the following command:

select count(*), sum([log record length])
   from fn_dblog(null, null)

The results for the Columnstore Indexes are presented below:
Adding Columnstore Indexes - Log WritesOne can clearly notice that the scaling goes up in a very much linear way – every 2 million rows we add to our source table will reflect in around 380 MB of information in the transaction log. If you are adding a Columnstore Index to your table where you already have a lot of data – be prepared for a lot of activities in your transaction log and you better have a fast drive ready.

But how do these results compare to adding a nonclustered hash index to an InMemorty table? Here is something to compare with:

Adding Columnstore Indexes Compared - Log Writes

Yes, you can see the correct results – the writes for the adding a nonclustered hash index do not go beyond 300 KB. Basically this is a meta-data operation, a highly optimised write procedures, which are extremely effective, since they do not bring impact on a linear basis (230-280KB of writes for varying 2 Million to 6 Million rows).

Now let’s move on to the elapsed time for the operations:
Elapsed Time Adding Columnstore Index

The times are pretty linear, and they do not look very bad – we pay with 7 seconds for every 2 million rows. Of course, might have to do with the amount of the CPU power I have had in my test VM. Let’s take a look on the elapsed times spent on the addition of the Nonclustered Hash Indexes:
Elpased Time - Adding Hash Index

The situation here is obviously different, with the lowest time for 2 million rows table going as low as 3.4 seconds, while scaling very well with 5.8 seconds for the 6 million rows table. This scaling is definitely not the same as in the case of the Columnstore Index: even 6 million rows table modification with an addition of the Nonclustered Hash index takes less time then an addition of a Columnstore Index for a 2 million rows table.

Let’s take a look in a perspective joining the information into a single graph:
Elapsed Time - Adding Both Indexes

This graphic has the same directions as the one on the transaction log writes, though of course not so much different, where the optimised log writes are definitely the biggest advantage of the nonclustered hash indexes.

What about the CPU time? Here is the graph with the detailed information:
CPU Time - Adding Both Indexes

The graphic with the CPU times shows the same tendency of Nonclustered Hash Indexes and the transactional log writes.

I am really impressed with the optimised log writes of the Nonclustered Hash Indexes addition, and I hope that Microsoft will implement something similar for the Columnstore Indexes as well.

The down time (the table goes offline during the ALTER TABLE command) is something that on the large scale installation will be suffering, but the overall progress is good, it is really great!

to be continued with Columnstore Indexes – part 82 (“Extended Events in SQL Server 2016”)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating