How to speed up creating Clustered ColumnStore Index??

Question

Post reply

How to speed up creating Clustered ColumnStore Index??

SqlRon

SSC Enthusiast

Points: 179
More actions
November 13, 2022 at 10:27 am

#4112296

Hi
we have a table in our Azure SQLDB with 35 billion rows, partitioned on a date. Indeed, a lot of rows.
This table is already in production.
We started creating a clustered column store index on it, but after 7 days (!) we got a time-out.
Now we have created an empty copy of the original table, put the partition and clustered columnstore index on the copy and started copying day by day. But this will also take a lot of time. Average time for a day (in between 8 and 12 million record) is 1 hour.
Does anybody know if there is a faster way of handling this??
Regards
Ron

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16309 More actions · Answer 1

what is the spec of that database instance? that will have an impact on it regardless of how you do it

and were you creating the index online or offline? - if online its normal to take a lot, really a lot of time.

with regards to the option to create a copy of the table and inserting into it - speed can vary depending on how you do it and again depending on the spec of the database instance.

main thing that affects performance for a insert into select from on a columnstore index is the use of "tablock" so it can do it in parallel - but.... DO NOT do it within a transaction as there is a bug on multiple versions of SQL Server where this leads to a thread deadlock

SqlRon SSC Enthusiast Points: 179 More actions · Answer 2

Hi Frederico, exactly whisch specs do you want me to share?

frederico_fonseca SSCoach Points: 16309 More actions · Answer 3

for an azure sqldb its either how many vcores you allocated or how many dtu's depending on the model you chosen.

and how much ram and what are the disk types chosen

and if basic, standard or premium.

SqlRon SSC Enthusiast Points: 179 More actions · Answer 4

Hi Frederico

we use Hyperscale with 32V Cores

The index is created online. If we do it offline, how much time will it then take to put it online again?

frederico_fonseca SSCoach Points: 16309 More actions · Answer 5

How busy is that server?

for that spec the inserts into new table should not take 1 hour for a pitiful 12 million rows unless server is nearly maxed.

regarding table

- are old records ever updated/deleted or static. and how many new records per day added/updated/deleted.

- are old records required for day to day querying or they are randomly queried for example just a quarter/year end

partitions - what is the row count of each partition on existing table

indexes - do you have a single index on that table or multiple - if multiple are they partitioned aligned or are there some that are not.

using the new table approach (which may not be the best option) would you be on a position to test the following - ensuring that the new table only contains the columnstore index, so we can take a baseline duration for the operation.

insert into newtable with (tablock) select * from old table where $partition.<partitionfunctionname>(<partitioncolumn>) = <partitionnumber>

where partitionnumber = one of the average count partitions on the main table

SqlRon SSC Enthusiast Points: 179 More actions · Answer 6

Hi see the answers below.

Our main goal is try to copy huge volume (30 billion+) data from table A (row-store) to table B (column store), some way we are looking for a solution to make this process much faster.

How busy is that server?

Server is 70 to 80 % busy during midnight until 8 AM, during this period all the batches (Data Ingestion into Warehouse) executes.

for that spec the inserts into new table should not take 1 hour for a pitiful 12 million rows unless server is nearly maxed.

Per day volume is 10 million to 25 million average.

regarding table

- are old records ever updated/deleted or static. and how many new records per day added/updated/deleted.

In this particular scenario only insert, there is no update or delete.

- are old records required for day to day querying or they are randomly queried for example just a quarter/year end

Old records getting queried on an aggregated basis only, no ad-hoc or random date select scenario.

partitions - what is the row count of each partition on existing table

partition Is based on day – as mentioned above Per day volume is 10 million to 25 million average.

indexes - do you have a single index on that table or multiple - if multiple are they partitioned aligned or are there some that are not.

Only single index – clustered partitioned aligned indx.

using the new table approach (which may not be the best option) would you be on a position to test the following - ensuring that the new table only contains the columnstore index, so we can take a baseline duration for the operation.

insert into newtable with (tablock) select * from old table where $partition.<partitionfunctionname>(<partitioncolumn>) = <partitionnumber>

where partitionnumber = one of the average count partitions on the main table

This we have already tested, hope you are trying to see whether the new table (with column store index) able to insert records into it, its already tested and working fine.

Please revert if you meant this test for any other purpose in case.

frederico_fonseca SSCoach Points: 16309 More actions · Answer 7

thanks for replies - does clarify things a lot.

regarding the particular test

insert into newtable with (tablock) select * from old table where $partition.<partitionfunctionname>(<partitioncolumn>) = <partitionnumber>

where partitionnumber = one of the average count partitions on the main table

This we have already tested, hope you are trying to see whether the new table (with column store index) able to insert records into it, its already tested and working fine.

my point was if you had tried exactly as I mentioned. e.g. using the tablock and the $partition where clause. and intention is to see how long that one alone takes - when server isn't busy!!

I'll get back with more things later (after my work hours)

SqlRon SSC Enthusiast Points: 179 More actions · Answer 8

Hi Frederico, the tablock speeded things up. Thnx

Another question.

If a have a clustered columnstore index on a table, can it then still be useful to have a non-clustered index on an attribute that is often used in a lookup query, such as accountnumber? If so, why is that benificiary?

Sasuke2690 Newbie Points: 2 More actions · Answer 9

Increasing: The INSERT process will be improved by using an increasing column, such as the IDENTITY column, as a clustered index key, which will directly insert the new values at the logical end of the table.

spider solitaire 2 suit

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

It sounds like there are multiple years (maybe 6 or so?) stored in the table. Column Store or not , I'd split it up into multiple tables, at least by year but, maybe even by quarter or month.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)