June 11, 2021 at 3:20 am
Hello,
I have 2 update queries relatively simple but acting on around 10 million rows hence the performance and temp db is severely impacted.
--- query 1
Update dbo. test
set lasttime = ( select max(time) from dbo.test)
--- query 2
update dbo.test
set col A = case when status = 'open' and date < DateAdd(d,-1,LastModifiedMaxDateTime) then DATEDIFF(d, date, LastModifiedMaxDateTime) END
set col B = case when status = 'open' and date < DateAdd(d,-1,LastModifiedMaxDateTime) then 'Y' else 'N' END,
set col C = case when status = 'open' then datediff(d,date,maxdate) end
These queries are getting executed through SSIS execute SQL task and taking 45 min to execute.
The table has a clustered index
Any pointers towards the improvement of performance will help.
June 11, 2021 at 12:53 pm
What columns are on the clustered index?
Do you have any other indexes on the table?
You know you could write the two update statements to be just one update statement?
June 11, 2021 at 1:00 pm
It has a clustered index but clustered index column is not included in where clause nor in the list of columns getting updated.
Does dropping a clustered index before update helps in performance?
June 11, 2021 at 1:33 pm
It has a clustered index but clustered index column is not included in where clause nor in the list of columns getting updated.
Does dropping a clustered index before update helps in performance?
I doubt the clustered index is the cause of the problem, you might want to rebuild it just so it's a bit more efficient when the query does a full scan of it.
Do you have any other indexes on the table?
The query should just do a scan of the clustered index to get the max(time) and another scan to update the table.
I would try rebuilding the clustered index and combining the two queries into one.
It also might be worth putting a where clause on the query to make sure rows that would be updated to be the same value they were before the update are not updated.
June 11, 2021 at 3:30 pm
Are you able to capture the execution plan for these? That could help determine what is actually happening. Do the queries run just as slow if you run them in SSMS rather than SSIS? If they run faster in SSMS, it may not hurt to convert those queries to stored procedures.
One thing that MAY help query 1's performance (may not) would be to toss the SELECT MAX result into a variable and do an update using the variable. I don't expect this to make a big difference (if any), but I've seen SQL do some weird things by tossing a single value into a variable rather than having it in the statement.
With query number 2, you MAY get a performance boost by adding a nonclustered index on the STATUS column (if you don't have that already). You could also move the "status='open'" to a WHERE clause on that UPDATE which may help (or may make no difference).
I would try these things on a test version of the database and not live as it may make things worse.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
June 11, 2021 at 7:18 pm
Is there a reason you have 2 queries - both look to be updating the same table and updating all rows in that table. Could be done using a single update statement.
What is the purpose of updating these columns? Where do these updates occur in the workflow - at the beginning or end of the load process?
In the second query - the code is only setting 'open' status rows. Is there a reason to reset non 'open' status rows to some value? The code would be setting those values to NULL because the CASE expression doesn't have an ELSE condition (on 2 of the 3 columns). If there is no reason to update the non 'open' rows then maybe exclude those rows from the update process.
With that said - if this is a post-process update then I would recommend having the insert/update process in the SSIS workflow provide the correct values when the rows are updated/inserted at that time, instead of using a post-process update.
And finally, if none of that is possible - then yes, perform the updates in batches. I would start around 200,000 rows and adjust up or down as needed.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 11, 2021 at 7:29 pm
thanks for the suggestions, I will try these and update the results.
Can we do updates in batches? If I have total record count of 10 million than what should be approx batch size for above update statements?
Would be helpful if you could supply the DDL for the table and details of all the indexes on the table. To do the job in batches you really need to search on an index to do it efficiently which is why I'm asking for these details.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply