how optimize update query which has column store index on all coumn and one non clustered index on the same column

Question

how optimize update query which has column store index on all coumn and one non clustered index on the same column

ravi@sql

SSCrazy

Points: 2121
More actions
November 14, 2017 at 5:03 am

#329520

Hi Champs,
I have an update query for record which has around 50 crores of data , and full table(120 columns) having column store index and one non clustered index with 5 columns are created on the same column which i want to update. Its data ware house table (fact table). Indexes are help full in select so cant drop or disable. I understand that because of index it is taking more time (9 hours ) .
Still need to tune this query . Any help pls ?
Query looks like this
with cte as
   (
      SELECT
         [Local_Date_Key]
         ,[Associate_Profile_key]
         ,AP.Associateid
         ,F.Record_update_date
         ,F.Record_Updated_By
      FROM [Fact_table]as F
      join dimentions as AP on F.[Associate_Profile_key]=AP.Associate_key
              and local_date_key >= 20160101

   )
   update c
      set c.Associate_Profile_key=AP.Associate_key,
      c.Record_update_date=sysdatetime(),
      c.Record_Updated_By=suser_name()
   from cte as c
   join dimentions as AP on c.Associateid=AP.Associateid
   and c.[Associate_Profile_key] <> AP.Associate_key;
Regards,
Ravi

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Evgeny Garaev SSCertifiable Points: 7428 More actions · Answer 1

First you need to improve the select part of your query, also from your query it is difficult to understand to which table the field [Local_Date_Key] belongs.
Can you show the execution plan of this query?
You may need indexes on AP.Associate_key and F.[Associate_Profile_key] with included fields AP.Associate_key, [Associate_Profile_key], and local_date_key.
If that query touch more than 20-30% of all records, the most efficient way would be: remove all indexes -> run the update -> re-create indexes.

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 2

Updates to a column store are not a good idea. Column store handles Inserts just fine, but Updates and Deletes slow the entire table access down. Column stores were made to be populated and read and should not be used for OLTP workloads.

If you must do regular updates/deletes, change the table to use a rowstore index or heap and then create non-clustered column store indexes for your reporting.

If the updates/deletes are sporadic, select the updated date into a new column store table, then swap the old table with the new one. I once had to run an update to a 100 billion row table. I ended up use this approach and each day (100 million records each) only took about 90 seconds to copy over.

Wes
(A solid design is always preferable to a creative workaround)

ravi@sql SSCrazy Points: 2121 More actions · Answer 3

Thanks a lot for your suggestion .

Can you please elaborate more on this "If the updates/deletes are sporadic, select the updated date into a new column store table, then swap the old table with the new one. " I need to run this update query every day.
This is my update query, you can refer my post for complete query.

update c
  set c.Associate_Profile_key=AP.Associate_key,
  c.Record_update_date=sysdatetime(),
  c.Record_Updated_By=suser_name()
from cte as c
join dimentions as AP on c.Associateid=AP.Associateid
and c.[Associate_Profile_key] <> AP.Associate_key;

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 4

I would consider updating once daily to be sporadic, rather than a traditional OLTP model. It is feasible to create a process to handle updating the data and then clean-up the indexes. However, handling updates for large volumes of data is not something that can be easily address through a forum post. You need to educate yourself on how data is stored, logically and physically, in SQL Server and the trade-offs between the index types.

Bottom line, you should not use a clustered column store index to store data that you may need to update or delete. Column store is tuned for high speed inserts and reads, only. That's it. If you're doing anything else with the data, you're causing yourself performance problems.

I suggest you looking into using a Clustered Index (Rowstore) with Non-clustered Columnstore

Wes
(A solid design is always preferable to a creative workaround)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

ravi@sql - Monday, November 20, 2017 10:19 PM
Thanks a lot for your suggestion .
Can you please elaborate more on this "If the updates/deletes are sporadic, select the updated date into a new column store table, then swap the old table with the new one. " I need to run this update query every day.
This is my update query, you can refer my post for complete query.
update c
  set c.Associate_Profile_key=AP.Associate_key,
  c.Record_update_date=sysdatetime(),
  c.Record_Updated_By=suser_name()
from cte as c
join dimentions as AP on c.Associateid=AP.Associateid
and c.[Associate_Profile_key] <> AP.Associate_key;

-- Reduce the query to its simplest form
UPDATE f SET
 Associate_Profile_key = ap2.Associate_key, 
 Record_update_date = sysdatetime(), 
 Record_Updated_By = suser_name() 
FROM Fact_table f
INNER JOIN dimentions ap1 
 ON f.Associate_Profile_key = ap1.Associate_key
 AND ap1.local_date_key >= 20160101
INNER JOIN dimentions ap2 
 ON ap1.Associateid = ap2.Associateid
 AND f.Associate_Profile_key <> ap2.Associate_key;

-- Examine the execution plan for obvious tuning opportunities...

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden