October 21, 2020 at 3:31 pm
Hi,
I have a very big table and I need to do a merge.
if the primary key and the record source match then I need to update one column, otherwise I need to insert.
I really dont need to do the whole table, but only the part when the data is bigger than a specific day.
I tried, but it is taking too long, because I think it scans the whole table.
--DECLARE @LastSeenDate datetime = @LoadDate - 2 (date)
MERGE
HubHistoryData AS Target --
USING
(
SELECT
DISTINCT
HistoryDataHashKey --
, LoadDate
, RecordSource
, LoadDate
, [HistoryDataPrimaryKey]
FROM
[HISTORY_DATA] --
WHERE
LoadDate = @LoadDate
) AS Source
(
HistoryDataHashKey --
, LoadDate
, RecordSource
, LastSeenDate
, [HistoryDataPrimaryKey]
)
ON
(
Target.[HistoryDataPrimaryKey] = Source.[HistoryDataPrimaryKey] AND --
Target.RecordSource = Source.RecordSource AND
Target.LastSeenDate > @LastSeenDate -- testing date
)
WHEN MATCHED THEN
UPDATE SET LastSeenDate = Source.LastSeenDate
WHEN NOT MATCHED THEN
INSERT (HistoryDataHashKey --
, LoadDate
, RecordSource
, LastSeenDate
, [HistoryDataPrimaryKey] )
VALUES (Source.HistoryDataHashKey --
,Source.LoadDate
,Source.RecordSource
,Source.LastSeenDate
,Source.[HistoryDataPrimaryKey]) -
October 21, 2020 at 3:39 pm
You might find performance improvements if you use this method instead.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 21, 2020 at 6:04 pm
why would you need extra field in the join? Shouldn't the primary key and Record Source be enough?
Target.[HistoryDataPrimaryKey] = Source.[HistoryDataPrimaryKey] AND --
Target.RecordSource = Source.RecordSource AND
then
when matched and Target.LastSeenDate > @LastSeenDate then update..
What does the execution plan look like?? How often does this run? How big are the blocks of data between runs?
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
October 21, 2020 at 8:27 pm
PhiI am trying the method on the link, but I am not sure how to bring the where without scanning the whole table every time
UPDATE HubHistoryData WITH (UPDLOCK, SERIALIZABLE) SET LastSeenDate = @LoadDate
WHERE = @key;
Mike, I tried that one first, and it takes the same time which is almost 4 hours, that is what trying to avoid.
October 21, 2020 at 9:04 pm
Rather than assuming what the optimiser is doing, please check the execution plans to be sure.
What indexes and keys do you have on HubHistoryData?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 21, 2020 at 10:11 pm
If possible I always revert to standard UPDATE and INSERT instead of using a MERGE, this will give better performance. This is also the recommendation from Microsoft: "Performance Tip: The conditional behavior described for the MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it doesn't exist, or updating a row if it matches. When simply updating one table based on the rows of another table, improve the performance and scalability with basic INSERT, UPDATE, and DELETE statements."
https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15
So something like this should perform better:
UPDATE Target
SET Target.LastSeenDate = Source.LastSeenDate
FROM HubHistoryData Target
INNER JOIN [HISTORY_DATA] Source
ON Target.[HistoryDataPrimaryKey] = Source.[HistoryDataPrimaryKey]
AND Target.RecordSource = Source.RecordSource
AND Target.LastSeenDate < Source.LastSeenDate
AND Source.LastSeenDate > @LastSeenDate;
INSERT INTO HubHistoryData
(
HistoryDataHashKey,
LoadDate,
RecordSource,
LastSeenDate,
[HistoryDataPrimaryKey]
)
SELECT Source.HistoryDataHashKey,
Source.LoadDate,
Source.RecordSource,
Source.LastSeenDate,
Source.[HistoryDataPrimaryKey]
FROM [HISTORY_DATA] Source
WHERE NOT EXISTS (SELECT *
FROM HubHistoryData Target
WHERE Target.[HistoryDataPrimaryKey] = Source.[HistoryDataPrimaryKey]
AND Target.RecordSource = Source.RecordSource
AND Target.LastSeenDate = Source.LastSeenDate)
AND Source.LastSeenDate > @LastSeenDate;
I wasn't sure what you were doing with the comparison to @LastSeenDate. I think you need to check the Source.LastSeenDate row is greater than @LastSeenDate, so I've put that in the code instead of what you had.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply