November 13, 2022 at 10:27 am
Hi
we have a table in our Azure SQLDB with 35 billion rows, partitioned on a date. Indeed, a lot of rows.
This table is already in production.
We started creating a clustered column store index on it, but after 7 days (!) we got a time-out.
Now we have created an empty copy of the original table, put the partition and clustered columnstore index on the copy and started copying day by day. But this will also take a lot of time. Average time for a day (in between 8 and 12 million record) is 1 hour.
Does anybody know if there is a faster way of handling this??
Regards
Ron
November 13, 2022 at 12:53 pm
what is the spec of that database instance? that will have an impact on it regardless of how you do it
and were you creating the index online or offline? - if online its normal to take a lot, really a lot of time.
with regards to the option to create a copy of the table and inserting into it - speed can vary depending on how you do it and again depending on the spec of the database instance.
main thing that affects performance for a insert into select from on a columnstore index is the use of "tablock" so it can do it in parallel - but.... DO NOT do it within a transaction as there is a bug on multiple versions of SQL Server where this leads to a thread deadlock
November 13, 2022 at 3:18 pm
Hi Frederico, exactly whisch specs do you want me to share?
November 13, 2022 at 4:21 pm
for an azure sqldb its either how many vcores you allocated or how many dtu's depending on the model you chosen.
and how much ram and what are the disk types chosen
and if basic, standard or premium.
November 14, 2022 at 7:49 am
Hi Frederico
we use Hyperscale with 32V Cores
The index is created online. If we do it offline, how much time will it then take to put it online again?
November 14, 2022 at 9:14 am
How busy is that server?
for that spec the inserts into new table should not take 1 hour for a pitiful 12 million rows unless server is nearly maxed.
regarding table
- are old records ever updated/deleted or static. and how many new records per day added/updated/deleted.
- are old records required for day to day querying or they are randomly queried for example just a quarter/year end
partitions - what is the row count of each partition on existing table
indexes - do you have a single index on that table or multiple - if multiple are they partitioned aligned or are there some that are not.
using the new table approach (which may not be the best option) would you be on a position to test the following - ensuring that the new table only contains the columnstore index, so we can take a baseline duration for the operation.
insert into newtable with (tablock) select * from old table where $partition.<partitionfunctionname>(<partitioncolumn>) = <partitionnumber>
where partitionnumber = one of the average count partitions on the main table
November 14, 2022 at 2:22 pm
Hi see the answers below.
Our main goal is try to copy huge volume (30 billion+) data from table A (row-store) to table B (column store), some way we are looking for a solution to make this process much faster.
How busy is that server?
Server is 70 to 80 % busy during midnight until 8 AM, during this period all the batches (Data Ingestion into Warehouse) executes.
for that spec the inserts into new table should not take 1 hour for a pitiful 12 million rows unless server is nearly maxed.
Per day volume is 10 million to 25 million average.
regarding table
- are old records ever updated/deleted or static. and how many new records per day added/updated/deleted.
In this particular scenario only insert, there is no update or delete.
- are old records required for day to day querying or they are randomly queried for example just a quarter/year end
Old records getting queried on an aggregated basis only, no ad-hoc or random date select scenario.
partitions - what is the row count of each partition on existing table
partition Is based on day – as mentioned above Per day volume is 10 million to 25 million average.
indexes - do you have a single index on that table or multiple - if multiple are they partitioned aligned or are there some that are not.
Only single index – clustered partitioned aligned indx.
using the new table approach (which may not be the best option) would you be on a position to test the following - ensuring that the new table only contains the columnstore index, so we can take a baseline duration for the operation.
insert into newtable with (tablock) select * from old table where $partition.<partitionfunctionname>(<partitioncolumn>) = <partitionnumber>
where partitionnumber = one of the average count partitions on the main table
This we have already tested, hope you are trying to see whether the new table (with column store index) able to insert records into it, its already tested and working fine.
Please revert if you meant this test for any other purpose in case.
November 14, 2022 at 3:14 pm
thanks for replies - does clarify things a lot.
regarding the particular test
insert into newtable with (tablock) select * from old table where $partition.<partitionfunctionname>(<partitioncolumn>) = <partitionnumber>
where partitionnumber = one of the average count partitions on the main table
This we have already tested, hope you are trying to see whether the new table (with column store index) able to insert records into it, its already tested and working fine.
my point was if you had tried exactly as I mentioned. e.g. using the tablock and the $partition where clause. and intention is to see how long that one alone takes - when server isn't busy!!
I'll get back with more things later (after my work hours)
November 18, 2022 at 9:07 am
Hi Frederico, the tablock speeded things up. Thnx
Another question.
If a have a clustered columnstore index on a table, can it then still be useful to have a non-clustered index on an attribute that is often used in a lookup query, such as accountnumber? If so, why is that benificiary?
December 5, 2022 at 9:55 am
Increasing: The INSERT process will be improved by using an increasing column, such as the IDENTITY column, as a clustered index key, which will directly insert the new values at the logical end of the table.
December 6, 2022 at 2:18 pm
It sounds like there are multiple years (maybe 6 or so?) stored in the table. Column Store or not , I'd split it up into multiple tables, at least by year but, maybe even by quarter or month.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply