July 25, 2016 at 11:55 am
Guys-
I have a table with 23 billion rows so Merge is out, as it scans the entire thing and has become a hog. So I'm going back in time to 2005 and before for this example.
The question that I have is, as the statement loops over the update, and the @@rowcount is hit, how can I use those values it tried updating with the insert statement section without rewriting a secondary query? In other words, I am performing an update first and if that is not performed I want to then insert that same row. The below is what seems to be an accepted answer from before Merge. The temp table can have close to 23 million records and the primary keys are SDP_ID and Meter_Read_Dttm. Any help or pointers on how to do this efficiently are appreciated.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
UPDATE Primary_Table SET SDP_EffBegDate=temp.SDP_EffBegDate,
Meter_Read_Status_Id=temp.Meter_Read_Status_Id,
Src_Del_Flag=temp.Src_Del_Flag,
ETL_Batch_Id=temp.ETL_Batch_Id,
Meter_Read=temp.Meter_Read
FROM Temp_Table temp, Primary_Table AS Hourly
WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2016-01-01 00:00:00';
IF @@ROWCOUNT = 0
BEGIN
INSERT Meter_Read_KWH_Hourly (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read) (Select clause or variables or?);
END
COMMIT TRANSACTION;
July 25, 2016 at 12:20 pm
fergfamster (7/25/2016)
Guys-I have a table with 23 billion rows so Merge is out, as it scans the entire thing and has become a hog. So I'm going back in time to 2005 and before for this example.
The question that I have is, as the statement loops over the update, and the @@rowcount is hit, how can I use those values it tried updating with the insert statement section without rewriting a secondary query? In other words, I am performing an update first and if that is not performed I want to then insert that same row. The below is what seems to be an accepted answer from before Merge. The temp table can have close to 23 million records and the primary keys are SDP_ID and Meter_Read_Dttm. Any help or pointers on how to do this efficiently are appreciated.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
UPDATE Primary_Table SET SDP_EffBegDate=temp.SDP_EffBegDate,
Meter_Read_Status_Id=temp.Meter_Read_Status_Id,
Src_Del_Flag=temp.Src_Del_Flag,
ETL_Batch_Id=temp.ETL_Batch_Id,
Meter_Read=temp.Meter_Read
FROM Temp_Table temp, Primary_Table AS Hourly
WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2016-01-01 00:00:00';
IF @@ROWCOUNT = 0
BEGIN
INSERT Meter_Read_KWH_Hourly (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read) (Select clause or variables or?);
END
COMMIT TRANSACTION;
This is basically what I would do, but there appears to be some issues with your UPDATE.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
UPDATE Primary_Table SET
SDP_EffBegDate = temp.SDP_EffBegDate,
Meter_Read_Status_Id = temp.Meter_Read_Status_Id,
Src_Del_Flag = temp.Src_Del_Flag,
ETL_Batch_Id = temp.ETL_Batch_Id,
Meter_Read = temp.Meter_Read
FROM
Temp_Table temp
INNER JOIN Primary_Table AS Hourly
on (hourly.SDP_Id = temp.SDP_Id and
Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and
Hourly.Meter_Read_Dttm >= '2016-01-01 00:00:00');
--IF @@ROWCOUNT = 0 -- What if there are values that need to be updated as well as inserted from Temp_Table?
INSERT Meter_Read_KWH_Hourly (
SDP_Id,
SDP_EffBegDate,
Meter_Read_Status_Id,
ETL_Batch_Id,
Src_Del_Flag,
Meter_Read_Dttm,
Meter_Read)
Select
temp.SDP_Id,
temp.SDP_EffBegDate,
temp.Meter_Read_Status_Id,
temp.ETL_Batch_Id,
temp.Src_Del_Flag,
temp.Meter_Read_Dttm,
temp.Meter_Read
FROM
Temp_Table temp
WHERE
NOT EXISTS(SELECT 1 FROM Primary_Table Hourly
WHERE
hourly.SDP_Id = temp.SDP_Id and
Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm);
COMMIT TRANSACTION;
Figured out my issue with your UPDATE statement, I couldn't see the equals. Please use white space, the computer may not care but it makes things easier to read by humans.
July 25, 2016 at 12:21 pm
http://source.entelect.co.za/why-is-this-upsert-code-broken
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 25, 2016 at 1:03 pm
Thanks Lynn, Ill try harder to use whitespace the next time I post.... You dont think using the @@rowcount is useful? Better to just do both statements?
July 25, 2016 at 1:04 pm
Thanks Gail, useful article!
July 25, 2016 at 1:15 pm
fergfamster (7/25/2016)
You dont think using the @@rowcount is useful? Better to just do both statements?
No, because what if there's 6 rows that need inserting and 4 updating? To use rowcount, you have to loop over those 10 rows. With Lynn's way you just run both statements, the update affects the 4 existing rows and the insert affects the 6 rows that weren't in the table at the start.
The exists needs lock hints though (as in my article), otherwise there's still a chance of duplicates deadlocks if there are multiple sessions running this concurrently.
Edit: corrected last line
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 25, 2016 at 1:16 pm
Thanks Guys and Gals! Appreciate the help!
July 25, 2016 at 1:32 pm
GilaMonster (7/25/2016)
fergfamster (7/25/2016)
You dont think using the @@rowcount is useful? Better to just do both statements?No, because what if there's 6 rows that need inserting and 4 updating? To use rowcount, you have to loop over those 10 rows. With Lynn's way you just run both statements, the update affects the 4 existing rows and the insert affects the 6 rows that weren't in the table at the start.
The exists needs lock hints though (as in my article), otherwise there's still a chance of duplicates if there are multiple sessions running this concurrently.
Even with this? Just want to be sure.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
July 25, 2016 at 1:37 pm
Lynn Pettis (7/25/2016)
GilaMonster (7/25/2016)
fergfamster (7/25/2016)
You dont think using the @@rowcount is useful? Better to just do both statements?No, because what if there's 6 rows that need inserting and 4 updating? To use rowcount, you have to loop over those 10 rows. With Lynn's way you just run both statements, the update affects the 4 existing rows and the insert affects the 6 rows that weren't in the table at the start.
The exists needs lock hints though (as in my article), otherwise there's still a chance of duplicates if there are multiple sessions running this concurrently.
Even with this? Just want to be sure.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Sorry, not duplicates. Deadlocks.
Multiple sessions take a shared range lock, multiple sessions go to convert the lock to X for the insert, can't be converted due to the shared range locks, instant deadlock.
I'd recommend use the locking hints and not the elevated isolation level, and it only needs the transaction if it's there for rollbacks on errors (and then with a try catch block).
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 25, 2016 at 1:49 pm
GilaMonster (7/25/2016)
Lynn Pettis (7/25/2016)
GilaMonster (7/25/2016)
fergfamster (7/25/2016)
You dont think using the @@rowcount is useful? Better to just do both statements?No, because what if there's 6 rows that need inserting and 4 updating? To use rowcount, you have to loop over those 10 rows. With Lynn's way you just run both statements, the update affects the 4 existing rows and the insert affects the 6 rows that weren't in the table at the start.
The exists needs lock hints though (as in my article), otherwise there's still a chance of duplicates if there are multiple sessions running this concurrently.
Even with this? Just want to be sure.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Sorry, not duplicates. Deadlocks.
Multiple sessions take a shared range lock, multiple sessions go to convert the lock to X for the insert, can't be converted due to the shared range locks, instant deadlock.
I'd recommend use the locking hints and not the elevated isolation level, and it only needs the transaction if it's there for rollbacks on errors (and then with a try catch block).
I have started code for deadlocks in some of the code I have written lately. Unfortunately, when it comes to some of the developers code that is embedded in the code, not much luck as I don't get to see it before it goes to production. If there are problems in production I usually can't see it there either as the production software runs on secured networks.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply