Mass update on Clustered Columnstore indexes

  • Hi,

    we have a FACT table with 35 billion rows, partitioned on daily level.

    In a test environment we put a clustered colmnstore index on this table. Storage reduced with 300%, reading data became much faster and inserting became a little faster. No updates are doen on this table.

    Now we have a second FACT table with 7 billion rows, partitioned on month.

    Daily there are around 7 billion inserts and 1.5 million updates. These updates can be done on the last 12 partitions.

    We want to add a clustered columnstore index on this table. But CCI and mass updates don't go well together I read.

    Is it a possibility to create a second table, with same partitioning but without CCI and then switch 12 partitions to this temporary table?

    Do the updates and inserts there and switch the partitions back to the original table? Or should this temporary table also have the same CCI?

    Are there other solutions to this problem?

    Regards

    Ron

  • The table needs the same CCI. When I need to do large updates to columnstores I switch one partition into a table with the same clustered columnstore, then create a clustered rowstore using the same name with drop_existing = on, perform the update, recreate the columnstore and switch back. My partitions are not very large, mostly under 15 million so it works well. The slowest part is the clustered rowstore create, but it does allow you to create some kind of rough ordering on the columnstore partitions if you want.

    1.5 million updates is a small number, so you could perform the updates directly on the columnstore. This results in new rows being stored in the delta store and changed rows flagged as deleted. The impact on performance may be acceptable, but I would test it. We do perform updates on some columnstores, and at the weekend I run maintenance to rebuild any partitions with deleted rows.

  • Hi Ron,

    You can definitely use partition switching in combination with columnstore indexes. And that can be a great way to optimize mass updates.

    During the time of the partition switch, the indexes need to be exactly the same. So at that time you do need to have a columnstore index on the worktable that contains the data to be switched in. But you can add that columnstore index just before the partition switch.

    So in short: Switch partition to be changed to worktable. Drop columnstore index. Perform mass update. Create columnstore index. Switch worktable back into partition.

    Do test this to verify whether it is actually faster than doing the mass update directly on the table. A lot of engineering work has been done on columnstore indexes to implement lots of optimizations. Perhaps the overhead of the mass update is not as bad as it used to be.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply