Merge Statement using CTE

Question

Post reply

Merge Statement using CTE

fergfamster

SSC Eights!

Points: 841
More actions
July 21, 2016 at 11:11 am

#315435

Guys-
I have the following Merge that i created to help speed things up with the min(date) parameter so the rows to scan are less. However, I'm still seeing a 20 minute or more time to complete. Any ideas on how I can enforce the date logic to only merge on the temp table rows and not scan the entire set in primary? We are talking about 12 to 20 million rows merged daily.
WITH Mergetst AS (SELECT *, min(Temp.Meter_Read_Dttm) as MinDate
FROM Meter_Read_KWH_Hourly_Temp AS Temp Group by SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read)
MERGE INTO Meter_Read_KWH_Hourly AS Hourly
USING Mergetst
ON Hourly.SDP_ID = Mergetst.sdp_id
and Hourly.Meter_Read_Dttm = Mergetst.Meter_Read_Dttm
and Hourly.Meter_Read_Dttm >= Mergetst.MinDate
WHEN Matched THEN
UPDATE SET
Hourly.SDP_EffBegDate=Mergetst.SDP_EffBegDate,
Hourly.Meter_Read_Status_Id=Mergetst.Meter_Read_Status_Id,
Hourly.Src_Del_Flag=Mergetst.Src_Del_Flag,
Hourly.ETL_Batch_Id=Mergetst.ETL_Batch_Id,
Hourly.Meter_Read=Mergetst.Meter_Read
WHEN not matched THEN
INSERT (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read)
VALUES(
Mergetst.SDP_Id,
Mergetst.SDP_EffBegDate,
Mergetst.Meter_Read_Status_Id,
Mergetst.ETL_Batch_Id,
Mergetst.Src_Del_Flag,
Mergetst.Meter_Read_Dttm,
Mergetst.Meter_Read);

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183706 More actions · Answer 1

Read this for better help: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Any suggestion at this point, would be a wild guess.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

fergfamster SSC Eights! Points: 841 More actions · Answer 2

Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

fergfamster (7/21/2016)
Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.

15 files? Why? It's just DDL for tables and indexes and the execution plan.

Do you realize that you're not aggregating anything? Meter_Read_Dttm, which is the column in the MIN() function, is included in the GROUP BY.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Smendle Hall of Fame Points: 3969 More actions · Answer 4

fergfamster (7/21/2016)
Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.

Because I have very little experience with MERGE statements and MERGE operations in general, I submit this for your (and mine) edification...

Please note I have only tried to recreate the DDL based on your grouping statement.

CREATE TABLE Meter_Read_KWH_Hourly_Temp(SDP_ID INT, SDP_EffBegDate DATETIME, Meter_Read_Status_ID INT,ETL_Batch_ID INT,Src_Del_Flag BIT

,Meter_Read_Dttm DATETIME, Meter_Read INT)

GO

CREATE TABLE MEter_Read_KWH_Hourly(SDP_ID INT, SDP_EffBegDate DATETIME, Meter_Read_Status_ID INT,ETL_Batch_ID INT,Src_Del_Flag BIT

,Meter_Read_Dttm DATETIME, Meter_Read INT)

GO

I have also added a loop to populate some data.... it took our beefy machine 7 minutes to generate 1M rows of data you can cut it down if you wish but based on your statement of 12-20 million rows I figured 1M rows was a good test bed.

Now what I did was run it with the datetime variable @TheDate=7/7/2016 01:00:00 and generated a 1M rows then I set the time to 7/7/2016 02:00:00 and ran it again.

SET NOCOUNT ON;

DECLARE @Count INT

DECLARE @TheDate DATETIME

DECLARE @SDP_EffBegDate DATETIME

DECLARE @Meter_Read_Status_ID INT

DECLARE @ETL_Batch_Id INT

DECLARE @Src_Del_Flag BIT

DECLARE @Meter_Read_Dttm DATETIME

DECLARE @Meter_Read INT

--for when you need to start over

--DELETE Meter_Read_KWH_Hourly_Temp

--DELETE MEter_Read_KWH_Hourly

--Check our data load progress

--SELECT COUNT(1) from Meter_Read_KWH_Hourly_Temp

--SELECT COUNT(1) from Meter_Read_KWH_Hourly

SET @Count=1 -- Tis obvious...

SET @TheDate = '07/07/2016 02:00:00' --I am using a variable because I need to populate previous time data

WHILE @Count < 1000000

BEGIN

--set our values to insert

SET @SDP_EffBegDate=CAST(CAST(MONTH(@TheDate) as VARCHAR(3))+'/1/'+ CAST(YEAR(@TheDate)as VARCHAR(4)) as DATETIME) --This is an arb. date presum. before the meter reading date

SET @Meter_Read_Status_ID = (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 10000 ) --change the last number to limit the Random number 1-9999 in this case

SET @ETL_Batch_ID = @Count + 1000 -- again an arbitrary number based upon the batch being processed in this case @count+1000 will do

SET @Src_Del_Flag = (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 2) --simple bit field lets randomize it!!!

SET @Meter_Read_Dttm = GETDATE() --THis is the actual date/time of the meter reading

SET @Meter_Read= (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 10000000 ) --The actual meter reading lets goto 10Mil!

INSERT INTO Meter_Read_KWH_Hourly_Temp (SDP_Id,SDP_EffBegDate,Meter_Read_Status_Id,ETL_Batch_Id,Src_Del_Flag,Meter_Read_Dttm,Meter_Read)

VALUES(@Count,@SDP_EffBegDate,@Meter_Read_Status_Id,@ETL_Batch_Id,@Src_Del_Flag,@Meter_Read_Dttm,@Meter_Read);

SET @Count=@Count+1

END

;

Then I ran your MERGE script, this took our beefy dev box 14 seconds to run against 2M rows

;WITH Mergetst AS (SELECT *, min(Temp.Meter_Read_Dttm) as MinDate

FROM Meter_Read_KWH_Hourly_Temp AS Temp Group by SDP_Id,

SDP_EffBegDate,

Meter_Read_Status_Id,

ETL_Batch_Id,

Src_Del_Flag,

Meter_Read_Dttm,

Meter_Read)

MERGE INTO Meter_Read_KWH_Hourly AS Hourly

USING Mergetst

ON Hourly.SDP_ID = Mergetst.sdp_id

and Hourly.Meter_Read_Dttm = Mergetst.Meter_Read_Dttm

and Hourly.Meter_Read_Dttm >= Mergetst.MinDate

WHEN Matched THEN

UPDATE SET

Hourly.SDP_EffBegDate=Mergetst.SDP_EffBegDate,

Hourly.Meter_Read_Status_Id=Mergetst.Meter_Read_Status_Id,

Hourly.Src_Del_Flag=Mergetst.Src_Del_Flag,

Hourly.ETL_Batch_Id=Mergetst.ETL_Batch_Id,

Hourly.Meter_Read=Mergetst.Meter_Read

WHEN not matched THEN

INSERT (

SDP_Id,

SDP_EffBegDate,

Meter_Read_Status_Id,

ETL_Batch_Id,

Src_Del_Flag,

Meter_Read_Dttm,

Meter_Read)

VALUES(

Mergetst.SDP_Id,

Mergetst.SDP_EffBegDate,

Mergetst.Meter_Read_Status_Id,

Mergetst.ETL_Batch_Id,

Mergetst.Src_Del_Flag,

Mergetst.Meter_Read_Dttm,

Mergetst.Meter_Read);

Execution Plan attached

fergfamster SSC Eights! Points: 841 More actions · Answer 5

Thank you for the effort and trials...! The Temp Table is 23 Million but the Primary table is 23 billion which i why i thought the date scan would make it faster. However, What I have found is that Merge will break all of this up into separate upsert statements and still scan the entire thing regardless. So what took your side 14 seconds is still taking mine 20-30 minutes even with enhanced resources. I am trying to now play with the following to compare the 2 approaches just to get a better overall testplan. If you have any other ideas that might be better would love to hear them. Merge and OUTPUT I also tried with little luck.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;

UPDATE Meter_Read_KWH_Hourly SET SDP_EffBegDate=temp.SDP_EffBegDate,

Meter_Read_Status_Id=temp.Meter_Read_Status_Id,

Src_Del_Flag=temp.Src_Del_Flag,

ETL_Batch_Id=temp.ETL_Batch_Id,

Meter_Read=temp.Meter_Read

FROM Meter_Read_KWH_Hourly_Temp temp, Meter_Read_KWH_Hourly AS Hourly

WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2015-05-28 07:00:00';

IF @@ROWCOUNT = 0

BEGIN

INSERT Meter_Read_KWH_Hourly (

SDP_Id,

SDP_EffBegDate,

Meter_Read_Status_Id,

ETL_Batch_Id,

Src_Del_Flag,

Meter_Read_Dttm,

Meter_Read) values(temp.SDP_EffBegDate,);

END

COMMIT TRANSACTION;

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 6

fergfamster (7/21/2016)
Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.

seems you havent figured it out yet ..so suggest you post what was asked for.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Smendle Hall of Fame Points: 3969 More actions · Answer 7

fergfamster (7/22/2016)
Thank you for the effort and trials...! The Temp Table is 23 Million but the Primary table is 23 billion which i why i thought the date scan would make it faster. However, What I have found is that Merge will break all of this up into separate upsert statements and still scan the entire thing regardless. So what took your side 14 seconds is still taking mine 20-30 minutes even with enhanced resources. I am trying to now play with the following to compare the 2 approaches just to get a better overall testplan. If you have any other ideas that might be better would love to hear them. Merge and OUTPUT I also tried with little luck.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
UPDATE Meter_Read_KWH_Hourly SET SDP_EffBegDate=temp.SDP_EffBegDate,
Meter_Read_Status_Id=temp.Meter_Read_Status_Id,
Src_Del_Flag=temp.Src_Del_Flag,
ETL_Batch_Id=temp.ETL_Batch_Id,
Meter_Read=temp.Meter_Read
FROM Meter_Read_KWH_Hourly_Temp temp, Meter_Read_KWH_Hourly AS Hourly
WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2015-05-28 07:00:00';
IF @@ROWCOUNT = 0
BEGIN
INSERT Meter_Read_KWH_Hourly (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read) values(temp.SDP_EffBegDate,);
END
COMMIT TRANSACTION;

Perhaps you should look at partitioning your table(s) in question. If you really only need to look at the previous days data partitioning it out by day or even hour would allow you to break up that 23 billion row table for your merge operation....If I get the time I will try to come up with a construct based on what I already got but if you could post the DDL, it would be very helpful.