ColumnStore Index - Rowgroup elimination - Inserting in Sort Order

  • Hi

    In 2014, In order to be able to do RowGroup elimination, we need "a particular column" to be in sorted order "within a partition".

    How do we ensure this particular column's values are in sorted order after a workload/batch ? Is it through MAXDOP = 1 feature when we create CCI Index?   For example, if my workload contains 3 million rows on the first day (when my table was empty), does MAXDOP = 1 guarantee that after the successful load, we have (nearly) 3 Rowgroups (and thus 3 segments) each with 1 million rows, and MAX value of Segment 1 is less-than-or-equal-to MIN value Segment 2, and so on?

    Extending this to day2, if I load further 3 million, will all the 6 rowgroups (and thus 6 segments) be in sorted order because of MAXDOP = 1 settings at the time of CCI index creation?

    thanks

  • I would think so, since SQL doesn't (re)order / sort rows going into a columnstore, it loads them in the same order in which they arrive.

    Thus, presumably the first 1,048,576 rows would go into rowgroup 1, the next 1,048,576 into rowgroup 2, etc., exactly as you described.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks Scott, for the reply.  However, my question still remains unanswered...let me put it this way.

    https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-performance-rowgroup-elimination/

    https://www.sqlpassion.at/archive/2017/01/30/columnstore-segment-elimination/

    As explained in the above blogs, for SQL engine to be able to make decision on eliminating RowGroups that are not required, we need to ensure the data is stored in sorted order (so, SELECT ….ORDER BY  is not the solution required here)

    So, this raises the following questions

    1) How do we ensure the data is Inserted in SORT order into each of the partitions, for each of the daily batch workloads?

    2) Assuming the data is stored in the sorted order WITHIN EACH PARTITION, how do we hint SQL engine to make use of RowGroup elimination feature?  Is it through MAXDOP=1  settings?   If so, are we not compromising on the multithreaded efficient way of SQL's processing by using only 1 thread or CPU of many CPUs available?

    thanks

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply