October 1, 2018 at 12:57 pm
Hello, I have a table where I have one year+ of data, and I need to delete the current month, then reload it, and when the next month comes, the delete statement will roll to the next month to delete. I was working first with a controlled delete against a table like this, but need help with the syntax of the qualify rownumber type of coding that I think is whats needed:
while 1=1 begin
delete from [database].[dbo].[TableName]
where [actvt_ym] between 201809 and 201810
if @@rowcount = 0 break;
end
Thank you
October 1, 2018 at 1:18 pm
The delete you've coded does ALL the needed deletes in one fell swoop, and having a loop isn't going to help with that DELETE. You do need a row number equivalent to help you. Something like this:WHILE 1 = 1
BEGIN;
WITH RowNums AS (
SELECT TOP 1 PERCENT
TN.PrimaryKeyColumnName
FROM [database].[dbo].[TableName] AS TN
WHERE TN.actvt_ym >= 201809
AND TN.actvt_ym < 201810
ORDER BY TN.PrimaryKeyColumnName
)
DELETE TN
FROM RowNums AS RN
INNER JOIN [database].[dbo].[TableName] AS TN
ON RN.PrimaryKeyColumnName = TN.PrimaryKeyColumnName;
IF @@ROWCOUNT = 0
BEGIN
BREAK;
END;
END;
You can adjust the percentage up or down based on your knowledge of the total row count for that month.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 1, 2018 at 1:51 pm
sgmunson - Monday, October 1, 2018 1:18 PMThe delete you've coded does ALL the needed deletes in one fell swoop, and having a loop isn't going to help with that DELETE. You do need a row number equivalent to help you. Something like this:WHILE 1 = 1
BEGIN;
WITH RowNums AS (SELECT TOP 1 PERCENT
TN.PrimaryKeyColumnName
FROM [database].[dbo].[TableName] AS TN
WHERE TN.actvt_ym >= 201809
AND TN.actvt_ym < 201810
ORDER BY TN.PrimaryKeyColumnName
)
DELETE TN
FROM RowNums AS RN
INNER JOIN [database].[dbo].[TableName] AS TN
ON RN.PrimaryKeyColumnName = TN.PrimaryKeyColumnName;
IF @@ROWCOUNT = 0
BEGIN
BREAK;
END;
END;You can adjust the percentage up or down based on your knowledge of the total row count for that month.
I also see no reason to use a ROW_NUMBER (which requires an expensive sort). The main reason for using a ROW_NUMBER() here would be to help track which records have already been processed, but you don't need to track them, because they've already been deleted and are no longer available to be processed on subsequent loops.
There also no reason to do the self join. It just adds unnecessary overhead and doesn't gain you anything.
DECLARE
@cur_count INTEGER,
@batch_size INTEGER = 50000,
@date_begin INTEGER = CONVERT(CHAR(6), DATEADD(MONTH, -1, GETDATE()), 112),
@date_end INTEGER = CONVERT(CHAR(6), GETDATE(), 112);
WHILE @cur_count = @batch_size
BEGIN
DELETE TOP (@batch_size)
FROM [database].[dbo].[TableName]
WHERE [actvt_ym] BETWEEN @date_begin AND @date_end;
SET @cur_count = @@ROWCOUNT;
end
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 1, 2018 at 1:54 pm
But lets say we were in Oct18 and now Nov18, how would the code know that the month changed, and would know to change from Oct18? I would think that needs to be dynamic, and this doesn't look like that, or I'm not getting it
October 1, 2018 at 1:55 pm
quinn.jay - Monday, October 1, 2018 1:54 PMBut lets say we were in Oct18 and now Nov18, how would the code know that the month changed, and would know to change from Oct18? I would think that needs to be dynamic, and this doesn't look like that, or I'm not getting it
Something very simple like this perhaps?:
delete from [database].[dbo].[TableName]where [actvt_ym] = FORMAT(GETDATE(),'yyyyMM')*1
October 1, 2018 at 2:08 pm
You don't really need a row number - nor do you need a break...
Declare @rowsAffected int = 0
, @batchSize int = 200000 -- adjust for your system
, @activityYearMonth int = cast(convert(char(6), getdate(), 112) As int);
While @rowsAffected > 0
Begin
Delete Top(@batchSize)
From [database].[dbo].[TableName]
Where [actvt_ym] = @activityYearMonth;
Set @rowsAffected = @@rowcount;
Checkpoint; -- Or - issue a tlog backup here if database is full recovery
End;
In your original code - you used between and 2 periods. Because BETWEEN is inclusive that delete would remove rows for both periods and I assume you only want it to remove data for the current period.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
October 2, 2018 at 5:09 am
Jeffrey Williams 3188 - Monday, October 1, 2018 2:08 PMYou don't really need a row number - nor do you need a break...
Declare @rowsAffected int = 0
, @batchSize int = 200000 -- adjust for your system
, @activityYearMonth int = cast(convert(char(6), getdate(), 112) As int);While @rowsAffected > 0
BeginDelete Top(@batchSize)
From [database].[dbo].[TableName]
Where [actvt_ym] = @activityYearMonth;Set @rowsAffected = @@rowcount;
Checkpoint; -- Or - issue a tlog backup here if database is full recovery
End;In your original code - you used between and 2 periods. Because BETWEEN is inclusive that delete would remove rows for both periods and I assume you only want it to remove data for the current period.
Check the @rowsAffected initial setting - the loop never executes.
October 2, 2018 at 5:28 am
quinn.jay - Monday, October 1, 2018 1:55 PMquinn.jay - Monday, October 1, 2018 1:54 PMBut lets say we were in Oct18 and now Nov18, how would the code know that the month changed, and would know to change from Oct18? I would think that needs to be dynamic, and this doesn't look like that, or I'm not getting itSomething very simple like this perhaps?:
delete from [database].[dbo].[TableName]where [actvt_ym] = FORMAT(GETDATE(),'yyyyMM')*1
If all that you want is 'current' year/month in YYYYMM format (as an INT), you can use this:
YEAR(GETDATE()) * 100 + MONTH(GETDATE())
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 2, 2018 at 7:19 am
gvoshol 73146 - Tuesday, October 2, 2018 5:09 AMJeffrey Williams 3188 - Monday, October 1, 2018 2:08 PMYou don't really need a row number - nor do you need a break...
Declare @rowsAffected int = 0
, @batchSize int = 200000 -- adjust for your system
, @activityYearMonth int = cast(convert(char(6), getdate(), 112) As int);While @rowsAffected > 0
BeginDelete Top(@batchSize)
From [database].[dbo].[TableName]
Where [actvt_ym] = @activityYearMonth;Set @rowsAffected = @@rowcount;
Checkpoint; -- Or - issue a tlog backup here if database is full recovery
End;In your original code - you used between and 2 periods. Because BETWEEN is inclusive that delete would remove rows for both periods and I assume you only want it to remove data for the current period.
Check the @rowsAffected initial setting - the loop never executes.
Typo, it should be initialized to 1...
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
October 2, 2018 at 8:17 am
drew.allen - Monday, October 1, 2018 1:51 PMsgmunson - Monday, October 1, 2018 1:18 PMThe delete you've coded does ALL the needed deletes in one fell swoop, and having a loop isn't going to help with that DELETE. You do need a row number equivalent to help you. Something like this:WHILE 1 = 1
BEGIN;
WITH RowNums AS (SELECT TOP 1 PERCENT
TN.PrimaryKeyColumnName
FROM [database].[dbo].[TableName] AS TN
WHERE TN.actvt_ym >= 201809
AND TN.actvt_ym < 201810
ORDER BY TN.PrimaryKeyColumnName
)
DELETE TN
FROM RowNums AS RN
INNER JOIN [database].[dbo].[TableName] AS TN
ON RN.PrimaryKeyColumnName = TN.PrimaryKeyColumnName;
IF @@ROWCOUNT = 0
BEGIN
BREAK;
END;
END;You can adjust the percentage up or down based on your knowledge of the total row count for that month.
I would not use PERCENT in a loop where you are deleting records,because the number of records you process each loop will decrease exponentially. You want to process approximately the same number of records each loop, so you should just use a set value.I also see no reason to use a ROW_NUMBER (which requires an expensive sort). The main reason for using a ROW_NUMBER() here would be to help track which records have already been processed, but you don't need to track them, because they've already been deleted and are no longer available to be processed on subsequent loops.
There also no reason to do the self join. It just adds unnecessary overhead and doesn't gain you anything.
DECLARE
@cur_count INTEGER,
@batch_size INTEGER = 50000,
@date_begin INTEGER = CONVERT(CHAR(6), DATEADD(MONTH, -1, GETDATE()), 112),
@date_end INTEGER = CONVERT(CHAR(6), GETDATE(), 112);SET @cur_count = @batch_sizeWHILE @cur_count = @batch_size
BEGIN
DELETE TOP (@batch_size)
FROM [database].[dbo].[TableName]
WHERE [actvt_ym] BETWEEN @date_begin AND @date_end;SET @cur_count = @@ROWCOUNT;
endDrew
Oops! Kinda forgot about the impact of what using a percentage would do. Here's my corrected code, based on yours, that addresses the date issue:DECLARE
@RowCount AS int = 1,
@BatchSize AS int = 50000,
@Today AS date = '20180901';
DECLARE
@StartMonth AS int = CONVERT(char(6), DATEFROMPARTS(YEAR(@Today), MONTH(@Today), 1), 112),
@EndMonth AS int = CONVERT(char(6), DATEADD(month, 1, @Today), 112);
WHILE @RowCount > 0
BEGIN
DELETE TOP (@BatchSize)
FROM [database].[schema].[TableName] AS TN
WHERE TN.actvt_ym >= @StartMonth
AND TN.actvt_ym < @EndMonth;
SET @RowCount = @@ROWCOUNT;
END;
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply