November 14, 2017 at 5:03 am
Hi Champs,
I have an update query for record which has around 50 crores of data , and full table(120 columns) having column store index and one non clustered index with 5 columns are created on the same column which i want to update. Its data ware house table (fact table). Indexes are help full in select so cant drop or disable. I understand that because of index it is taking more time (9 hours ) .
Still need to tune this query . Any help pls ?
Query looks like this
with cte as
(
SELECT
[Local_Date_Key]
,[Associate_Profile_key]
,AP.Associateid
,F.Record_update_date
,F.Record_Updated_By
FROM [Fact_table]as F
join dimentions as AP on F.[Associate_Profile_key]=AP.Associate_key
and local_date_key >= 20160101
)
update c
set c.Associate_Profile_key=AP.Associate_key,
c.Record_update_date=sysdatetime(),
c.Record_Updated_By=suser_name()
from cte as c
join dimentions as AP on c.Associateid=AP.Associateid
and c.[Associate_Profile_key] <> AP.Associate_key;
Regards,
Ravi
November 15, 2017 at 8:18 pm
First you need to improve the select part of your query, also from your query it is difficult to understand to which table the field [Local_Date_Key] belongs.
Can you show the execution plan of this query?
You may need indexes on AP.Associate_key and F.[Associate_Profile_key] with included fields AP.Associate_key, [Associate_Profile_key], and local_date_key.
If that query touch more than 20-30% of all records, the most efficient way would be: remove all indexes -> run the update -> re-create indexes.
November 20, 2017 at 2:24 pm
Updates to a column store are not a good idea. Column store handles Inserts just fine, but Updates and Deletes slow the entire table access down. Column stores were made to be populated and read and should not be used for OLTP workloads.
If you must do regular updates/deletes, change the table to use a rowstore index or heap and then create non-clustered column store indexes for your reporting.
If the updates/deletes are sporadic, select the updated date into a new column store table, then swap the old table with the new one. I once had to run an update to a 100 billion row table. I ended up use this approach and each day (100 million records each) only took about 90 seconds to copy over.
Wes
(A solid design is always preferable to a creative workaround)
November 20, 2017 at 10:19 pm
Thanks a lot for your suggestion .
Can you please elaborate more on this "If the updates/deletes are sporadic, select the updated date into a new column store table, then swap the old table with the new one. " I need to run this update query every day.
This is my update query, you can refer my post for complete query.
update c
set c.Associate_Profile_key=AP.Associate_key,
c.Record_update_date=sysdatetime(),
c.Record_Updated_By=suser_name()
from cte as c
join dimentions as AP on c.Associateid=AP.Associateid
and c.[Associate_Profile_key] <> AP.Associate_key;
November 21, 2017 at 6:43 am
I would consider updating once daily to be sporadic, rather than a traditional OLTP model. It is feasible to create a process to handle updating the data and then clean-up the indexes. However, handling updates for large volumes of data is not something that can be easily address through a forum post. You need to educate yourself on how data is stored, logically and physically, in SQL Server and the trade-offs between the index types.
Bottom line, you should not use a clustered column store index to store data that you may need to update or delete. Column store is tuned for high speed inserts and reads, only. That's it. If you're doing anything else with the data, you're causing yourself performance problems.
I suggest you looking into using a Clustered Index (Rowstore) with Non-clustered Columnstore
Wes
(A solid design is always preferable to a creative workaround)
November 21, 2017 at 7:42 am
ravi@sql - Monday, November 20, 2017 10:19 PMThanks a lot for your suggestion .Can you please elaborate more on this "If the updates/deletes are sporadic, select the updated date into a new column store table, then swap the old table with the new one. " I need to run this update query every day.
This is my update query, you can refer my post for complete query.update c
set c.Associate_Profile_key=AP.Associate_key,
c.Record_update_date=sysdatetime(),
c.Record_Updated_By=suser_name()
from cte as c
join dimentions as AP on c.Associateid=AP.Associateid
and c.[Associate_Profile_key] <> AP.Associate_key;
-- Reduce the query to its simplest form
UPDATE f SET
Associate_Profile_key = ap2.Associate_key,
Record_update_date = sysdatetime(),
Record_Updated_By = suser_name()
FROM Fact_table f
INNER JOIN dimentions ap1
ON f.Associate_Profile_key = ap1.Associate_key
AND ap1.local_date_key >= 20160101
INNER JOIN dimentions ap2
ON ap1.Associateid = ap2.Associateid
AND f.Associate_Profile_key <> ap2.Associate_key;
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply