May 10, 2023 at 6:36 am
I have an archiving job running on our production database which will move old datas from an existing production DB to another archiving DB running on the same box.
The job utilize on MERGE INTO operations to do the archiving. This job isn't a new job and has been running for quite some time. Users are also aware that the job takes time to complete, there's also issues that the server tends to run out of tempdb space as they have multiple jobs doing the same activity on different huge tables.
I've been assign to see if there's any way to improve the performance or reduce the issues. When I check the execution plan of the query, I notice that there an Index Spool Eager Pool operation which took out 90% of the total cost. When I did some research regarding Index Spool I came to understand that usually this can be resolved by creating a clustered index on the table. I then shared this info to the development team & ask them to try and create the index, however I still notice that the Index Spool still there.
Below is the Execution Plan before and after index creation.
Before - https://www.brentozar.com/pastetheplan/?id=HJGGKhdE3
After - https://www.brentozar.com/pastetheplan/?id=B1DIonuVh
Is there anything else I can do or what did I missed out?
May 10, 2023 at 7:55 am
Did you try to run this NOT using MERGE ?
ref: https://michaeljswart.com/2021/08/what-to-avoid-if-you-want-to-use-merge/
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 10, 2023 at 8:02 am
Did you try to run this NOT using MERGE ?
No I have not. I'm mainly doing general administrator for the DB, the query itself is developed by my application team. I did not ask them to redesign the query yet as I'm trying to see what other things can be improved from the DB side of things.
May 10, 2023 at 8:28 am
if not mistaken on this case the index spool is because your archive table is partitioned (on a column other than GSN_ID, most likely TNR_DATE) and SQL creates the spool so it can do the processing with the data ordered by partitionID.
Not much you can do here other than changing how the merge is defined, and eventually even split it into 2 separate blocks (update and insert).
but before you go there I would look at what are the number of records that are truly being updated as your merge blindly updates every match even if their contents are exactly the same.
possibly a approach that may reduce the volume of updates is to determine in advance what are inserts and what are updates - and at the same time remove the distinct sort to a separate step on the process.
one aspect that is also rather important is if the partition column can/is also changed at source based on its primary key - if it is this also means the data may be moved from one partition to another (not desirable).
see below for a possible approach
drop table if exists #TemptblFST2 -- if your version allows it - otherwise use old "if object_id(...) is not null drop table #TemptblFST2
-- create temp table with clear definition of what is update and insert - and ignore anything not needing to be updated
select top (0)
tempfst.Die_ID
, tempfst.GSN_ID
, tempfst.FST_Lot_ID
, tempfst.FST_SubLot_ID
, tempfst.FST_X
, tempfst.FST_Y
, tempfst.FST_Bin
, tempfst.FST_Date
, tempfst.GsnBeforeFix
, tempfst.FileID
, cast('A' as char(1)) as Operation
into #TemptblFST2
from #TemptblFST tempfst
create clustered index #TemptblFST2_ix1 on #TemptblFST2
(GSN_ID
)
-- if the partition date column on table TraceDB_Arc.dbo.tblFST never changes then add that column to the index above as well as the FIRST column
insert into #TemptblFST2 with (tablock)
select tempfst.*
, case
when arcfst.GSN_ID is null
then 'I'
else 'U'
end as Operation
from (
SELECT DISTINCT
tempfst.Die_ID
, tempfst.GSN_ID
, tempfst.FST_Lot_ID
, tempfst.FST_SubLot_ID
, tempfst.FST_X
, tempfst.FST_Y
, tempfst.FST_Bin
, tempfst.FST_Date
, tempfst.GsnBeforeFix
, tempfst.FileID
, tempfst.TNR_Date
FROM #TemptblFST tempfst
) AS tempfst
left outer join TraceDB_Arc.dbo.tblFST AS arcfst
ON tempfst.GSN_ID = arcfst.GSN_ID
outer apply (select case
when HASHBYTES ('SHA1',concat(tempfst.FST_Lot_ID
'|', tempfst.FST_SubLot_ID
'|', tempfst.FST_X
'|', tempfst.FST_Y
'|', tempfst.FST_Bin
'|', convert(varchar(26), tempfst.FST_Date, 121)
'|', tempfst.GsnBeforeFix
'|', tempfst.FileID
'|', convert(varchar(26), tempfst.TNR_Date, 121)
)
<> HASHBYTES ('SHA1',concat( arcfst.FST_Lot_ID
'|', arcfst.FST_SubLot_ID
'|', arcfst.FST_X
'|', arcfst.FST_Y
'|', arcfst.FST_Bin
'|', convert(varchar(26), arcfst.FST_Date, 121)
'|', arcfst.GsnBeforeFix
'|', arcfst.FileID
'|', convert(varchar(26), arcfst.TNR_Date, 121)
)
then 'Y'
else ''
end as IsUpdate
) flag
where arcfst.GSN_ID is null -- inserts
or flag.IsUpdate = 'Y'
-- at this point if table #TemptblFST is not longer required for the remaining of the code explicitly drop it so tempdb space used is released
MERGE INTO TraceDB_Arc.dbo.tblFST AS arcfst
USING
(
SELECT *
FROM #TemptblFST2
) AS tempfst
-- if partition column never changes on source then add it as well on this on clause and remove from update list
ON tempfst.GSN_ID = arcfst.GSN_ID
-- and tempfst.partitioncolumn = arcfst.partitioncolumn
WHEN MATCHED
AND tempfst.Operation = 'U'
THEN UPDATE
SET
arcfst.Die_ID = tempfst.Die_ID
, arcfst.GSN_ID = tempfst.GSN_ID
, arcfst.FST_Lot_ID = tempfst.FST_Lot_ID
, arcfst.FST_SubLot_ID = tempfst.FST_SubLot_ID
, arcfst.FST_X = tempfst.FST_X
, arcfst.FST_Y = tempfst.FST_Y
, arcfst.FST_Bin = tempfst.FST_Bin
, arcfst.FST_Date = tempfst.FST_Date
, arcfst.GsnBeforeFix = tempfst.GsnBeforeFix
, arcfst.FileID = tempfst.FileID
, arcfst.TNR_Date = tempfst.TNR_Date
WHEN NOT MATCHED THEN
INSERT
(
Die_ID
, GSN_ID
, FST_Lot_ID
, FST_SubLot_ID
, FST_X
, FST_Y
, FST_Bin
, FST_Date
, GsnBeforeFix
, FileID
, TNR_Date
)
VALUES
(
tempfst.Die_ID
, tempfst.GSN_ID
, tempfst.FST_Lot_ID
, tempfst.FST_SubLot_ID
, tempfst.FST_X
, tempfst.FST_Y
, tempfst.FST_Bin
, tempfst.FST_Date
, tempfst.GsnBeforeFix
, tempfst.FileID
, tempfst.TNR_Date
)
OUTPUT $action into @mergeAction option (maxdop 4);
May 10, 2023 at 12:27 pm
On top of all this, remember that the execution plan for a MERGE operation is made based on the values initially passed. So, you may have a merge that's all INSERTS with no UPDATES or DELETES. Then the plans for the UPDATES & DELETES are based on zero rows, which will be a radically different plan, possibly, than plans based on actual rows being moved. This is one of the MANY weaknesses of MERGE. You're frequently better off breaking up the UPSERT into three different processes supported by three different queries or procs so that each compiles independently.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 10, 2023 at 12:38 pm
This was removed by the editor as SPAM
May 10, 2023 at 1:13 pm
If partitioning IS present, why is there a need for any of this? SWITCH the old data out. Copy it to a new table in the archive. Then SWITCH that into the archive table and drop the old data that was switched out in the prod database just by dropping the switched out table.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 10, 2023 at 2:33 pm
If partitioning IS present, why is there a need for any of this? SWITCH the old data out. Copy it to a new table in the archive. Then SWITCH that into the archive table and drop the old data that was switched out in the prod database just by dropping the switched out table.
Hi Jeff,
I wish I could tell you why are MERGE being used even though partitioning exist on the table. Problem is this database has been around for some time now and it's manage by several past DBAs and developers which most of them are no longer around for me to ask. Current application owners just manage the application as it is by adding & updating queries which they think works for them without considering what impact it have towards the database. I ran Brent's sp_blitz when I come onboard to do a health checks on the DB and it is a mess.
Being a non developer myself, do advice if I got your point wrongly.
Do you mean they can just copy out the data from prod DB into a new table in Archive DB then drop those old data from Prod DB. Then I could proceed to copy that data from the new table to ArchiveDB's archive table?
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply