July 21, 2016 at 11:11 am
Guys-
I have the following Merge that i created to help speed things up with the min(date) parameter so the rows to scan are less. However, I'm still seeing a 20 minute or more time to complete. Any ideas on how I can enforce the date logic to only merge on the temp table rows and not scan the entire set in primary? We are talking about 12 to 20 million rows merged daily.
WITH Mergetst AS (SELECT *, min(Temp.Meter_Read_Dttm) as MinDate
FROM Meter_Read_KWH_Hourly_Temp AS Temp Group by SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read)
MERGE INTO Meter_Read_KWH_Hourly AS Hourly
USING Mergetst
ON Hourly.SDP_ID = Mergetst.sdp_id
and Hourly.Meter_Read_Dttm = Mergetst.Meter_Read_Dttm
and Hourly.Meter_Read_Dttm >= Mergetst.MinDate
WHEN Matched THEN
UPDATE SET
Hourly.SDP_EffBegDate=Mergetst.SDP_EffBegDate,
Hourly.Meter_Read_Status_Id=Mergetst.Meter_Read_Status_Id,
Hourly.Src_Del_Flag=Mergetst.Src_Del_Flag,
Hourly.ETL_Batch_Id=Mergetst.ETL_Batch_Id,
Hourly.Meter_Read=Mergetst.Meter_Read
WHEN not matched THEN
INSERT (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read)
VALUES(
Mergetst.SDP_Id,
Mergetst.SDP_EffBegDate,
Mergetst.Meter_Read_Status_Id,
Mergetst.ETL_Batch_Id,
Mergetst.Src_Del_Flag,
Mergetst.Meter_Read_Dttm,
Mergetst.Meter_Read);
July 21, 2016 at 11:30 am
Read this for better help: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Any suggestion at this point, would be a wild guess.
July 21, 2016 at 11:33 am
Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.
July 21, 2016 at 11:45 am
fergfamster (7/21/2016)
Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.
15 files? Why? It's just DDL for tables and indexes and the execution plan.
Do you realize that you're not aggregating anything? Meter_Read_Dttm, which is the column in the MIN() function, is included in the GROUP BY.
July 22, 2016 at 11:04 am
fergfamster (7/21/2016)
Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.
Because I have very little experience with MERGE statements and MERGE operations in general, I submit this for your (and mine) edification...
Please note I have only tried to recreate the DDL based on your grouping statement.
CREATE TABLE Meter_Read_KWH_Hourly_Temp(SDP_ID INT, SDP_EffBegDate DATETIME, Meter_Read_Status_ID INT,ETL_Batch_ID INT,Src_Del_Flag BIT
,Meter_Read_Dttm DATETIME, Meter_Read INT)
GO
CREATE TABLE MEter_Read_KWH_Hourly(SDP_ID INT, SDP_EffBegDate DATETIME, Meter_Read_Status_ID INT,ETL_Batch_ID INT,Src_Del_Flag BIT
,Meter_Read_Dttm DATETIME, Meter_Read INT)
GO
I have also added a loop to populate some data.... it took our beefy machine 7 minutes to generate 1M rows of data you can cut it down if you wish but based on your statement of 12-20 million rows I figured 1M rows was a good test bed.
Now what I did was run it with the datetime variable @TheDate=7/7/2016 01:00:00 and generated a 1M rows then I set the time to 7/7/2016 02:00:00 and ran it again.
SET NOCOUNT ON;
DECLARE @Count INT
DECLARE @TheDate DATETIME
DECLARE @SDP_EffBegDate DATETIME
DECLARE @Meter_Read_Status_ID INT
DECLARE @ETL_Batch_Id INT
DECLARE @Src_Del_Flag BIT
DECLARE @Meter_Read_Dttm DATETIME
DECLARE @Meter_Read INT
--for when you need to start over
--DELETE Meter_Read_KWH_Hourly_Temp
--DELETE MEter_Read_KWH_Hourly
--Check our data load progress
--SELECT COUNT(1) from Meter_Read_KWH_Hourly_Temp
--SELECT COUNT(1) from Meter_Read_KWH_Hourly
SET @Count=1 -- Tis obvious...
SET @TheDate = '07/07/2016 02:00:00' --I am using a variable because I need to populate previous time data
WHILE @Count < 1000000
BEGIN
--set our values to insert
SET @SDP_EffBegDate=CAST(CAST(MONTH(@TheDate) as VARCHAR(3))+'/1/'+ CAST(YEAR(@TheDate)as VARCHAR(4)) as DATETIME) --This is an arb. date presum. before the meter reading date
SET @Meter_Read_Status_ID = (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 10000 ) --change the last number to limit the Random number 1-9999 in this case
SET @ETL_Batch_ID = @Count + 1000 -- again an arbitrary number based upon the batch being processed in this case @count+1000 will do
SET @Src_Del_Flag = (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 2) --simple bit field lets randomize it!!!
SET @Meter_Read_Dttm = GETDATE() --THis is the actual date/time of the meter reading
SET @Meter_Read= (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 10000000 ) --The actual meter reading lets goto 10Mil!
INSERT INTO Meter_Read_KWH_Hourly_Temp (SDP_Id,SDP_EffBegDate,Meter_Read_Status_Id,ETL_Batch_Id,Src_Del_Flag,Meter_Read_Dttm,Meter_Read)
VALUES(@Count,@SDP_EffBegDate,@Meter_Read_Status_Id,@ETL_Batch_Id,@Src_Del_Flag,@Meter_Read_Dttm,@Meter_Read);
SET @Count=@Count+1
END
;
Then I ran your MERGE script, this took our beefy dev box 14 seconds to run against 2M rows
;WITH Mergetst AS (SELECT *, min(Temp.Meter_Read_Dttm) as MinDate
FROM Meter_Read_KWH_Hourly_Temp AS Temp Group by SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read)
MERGE INTO Meter_Read_KWH_Hourly AS Hourly
USING Mergetst
ON Hourly.SDP_ID = Mergetst.sdp_id
and Hourly.Meter_Read_Dttm = Mergetst.Meter_Read_Dttm
and Hourly.Meter_Read_Dttm >= Mergetst.MinDate
WHEN Matched THEN
UPDATE SET
Hourly.SDP_EffBegDate=Mergetst.SDP_EffBegDate,
Hourly.Meter_Read_Status_Id=Mergetst.Meter_Read_Status_Id,
Hourly.Src_Del_Flag=Mergetst.Src_Del_Flag,
Hourly.ETL_Batch_Id=Mergetst.ETL_Batch_Id,
Hourly.Meter_Read=Mergetst.Meter_Read
WHEN not matched THEN
INSERT (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read)
VALUES(
Mergetst.SDP_Id,
Mergetst.SDP_EffBegDate,
Mergetst.Meter_Read_Status_Id,
Mergetst.ETL_Batch_Id,
Mergetst.Src_Del_Flag,
Mergetst.Meter_Read_Dttm,
Mergetst.Meter_Read);
Execution Plan attached
July 22, 2016 at 11:16 am
Thank you for the effort and trials...! The Temp Table is 23 Million but the Primary table is 23 billion which i why i thought the date scan would make it faster. However, What I have found is that Merge will break all of this up into separate upsert statements and still scan the entire thing regardless. So what took your side 14 seconds is still taking mine 20-30 minutes even with enhanced resources. I am trying to now play with the following to compare the 2 approaches just to get a better overall testplan. If you have any other ideas that might be better would love to hear them. Merge and OUTPUT I also tried with little luck.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
UPDATE Meter_Read_KWH_Hourly SET SDP_EffBegDate=temp.SDP_EffBegDate,
Meter_Read_Status_Id=temp.Meter_Read_Status_Id,
Src_Del_Flag=temp.Src_Del_Flag,
ETL_Batch_Id=temp.ETL_Batch_Id,
Meter_Read=temp.Meter_Read
FROM Meter_Read_KWH_Hourly_Temp temp, Meter_Read_KWH_Hourly AS Hourly
WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2015-05-28 07:00:00';
IF @@ROWCOUNT = 0
BEGIN
INSERT Meter_Read_KWH_Hourly (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read) values(temp.SDP_EffBegDate,);
END
COMMIT TRANSACTION;
July 22, 2016 at 11:40 am
fergfamster (7/21/2016)
Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.
seems you havent figured it out yet ..so suggest you post what was asked for.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
July 22, 2016 at 11:45 am
fergfamster (7/22/2016)
Thank you for the effort and trials...! The Temp Table is 23 Million but the Primary table is 23 billion which i why i thought the date scan would make it faster. However, What I have found is that Merge will break all of this up into separate upsert statements and still scan the entire thing regardless. So what took your side 14 seconds is still taking mine 20-30 minutes even with enhanced resources. I am trying to now play with the following to compare the 2 approaches just to get a better overall testplan. If you have any other ideas that might be better would love to hear them. Merge and OUTPUT I also tried with little luck.SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
UPDATE Meter_Read_KWH_Hourly SET SDP_EffBegDate=temp.SDP_EffBegDate,
Meter_Read_Status_Id=temp.Meter_Read_Status_Id,
Src_Del_Flag=temp.Src_Del_Flag,
ETL_Batch_Id=temp.ETL_Batch_Id,
Meter_Read=temp.Meter_Read
FROM Meter_Read_KWH_Hourly_Temp temp, Meter_Read_KWH_Hourly AS Hourly
WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2015-05-28 07:00:00';
IF @@ROWCOUNT = 0
BEGIN
INSERT Meter_Read_KWH_Hourly (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read) values(temp.SDP_EffBegDate,);
END
COMMIT TRANSACTION;
Perhaps you should look at partitioning your table(s) in question. If you really only need to look at the previous days data partitioning it out by day or even hour would allow you to break up that 23 billion row table for your merge operation....If I get the time I will try to come up with a construct based on what I already got but if you could post the DDL, it would be very helpful.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply