Delete data with sliding date window

  • Hello, I have a table where I have one year+ of data, and I need to delete the current month, then reload it, and when the next month comes, the delete statement will roll to the next month to delete. I was working first with a controlled delete against a table like this, but need help with the syntax of the qualify rownumber type of coding that I think is whats needed:

    while 1=1 begin
    delete from [database].[dbo].[TableName]
    where [actvt_ym] between 201809 and 201810
    if @@rowcount = 0 break;
    end

    Thank you

  • The delete you've coded does ALL the needed deletes in one fell swoop, and having a loop isn't going to help with that DELETE.   You do need a row number equivalent to help you.  Something like this:
    WHILE 1 = 1
        BEGIN;
        WITH RowNums AS (

            SELECT TOP 1 PERCENT
                TN.PrimaryKeyColumnName
            FROM [database].[dbo].[TableName] AS TN
            WHERE    TN.actvt_ym >= 201809
                AND TN.actvt_ym < 201810
            ORDER BY TN.PrimaryKeyColumnName
        )
        DELETE TN
        FROM RowNums AS RN
        INNER JOIN [database].[dbo].[TableName] AS TN
            ON RN.PrimaryKeyColumnName = TN.PrimaryKeyColumnName;
        
        IF @@ROWCOUNT = 0
            BEGIN
            BREAK;
            END;
        END;

    You can adjust the percentage up or down based on your knowledge of the total row count for that month.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Monday, October 1, 2018 1:18 PM

    The delete you've coded does ALL the needed deletes in one fell swoop, and having a loop isn't going to help with that DELETE.   You do need a row number equivalent to help you.  Something like this:
    WHILE 1 = 1
        BEGIN;
        WITH RowNums AS (

            SELECT TOP 1 PERCENT
                TN.PrimaryKeyColumnName
            FROM [database].[dbo].[TableName] AS TN
            WHERE    TN.actvt_ym >= 201809
                AND TN.actvt_ym < 201810
            ORDER BY TN.PrimaryKeyColumnName
        )
        DELETE TN
        FROM RowNums AS RN
        INNER JOIN [database].[dbo].[TableName] AS TN
            ON RN.PrimaryKeyColumnName = TN.PrimaryKeyColumnName;
        
        IF @@ROWCOUNT = 0
            BEGIN
            BREAK;
            END;
        END;

    You can adjust the percentage up or down based on your knowledge of the total row count for that month.

    I would not use PERCENT in a loop where you are deleting records,because the number of records you process each loop will decrease exponentially.  You want to process approximately the same number of records each loop, so you should just use a set value.

    I also see no reason to use a ROW_NUMBER (which requires an expensive sort).  The main reason for using a ROW_NUMBER() here would be to help track which records have already been processed, but you don't need to track them, because they've already been deleted and are no longer available to be processed on subsequent loops.

    There also no reason to do the self join.  It just adds unnecessary overhead and doesn't gain you anything.

    DECLARE
        @cur_count INTEGER,
        @batch_size INTEGER = 50000,
        @date_begin INTEGER = CONVERT(CHAR(6), DATEADD(MONTH, -1, GETDATE()), 112),
        @date_end INTEGER = CONVERT(CHAR(6), GETDATE(), 112);

    SET @cur_count = @batch_size

    WHILE @cur_count = @batch_size
    BEGIN
        DELETE TOP (@batch_size)
        FROM [database].[dbo].[TableName]
        WHERE [actvt_ym] BETWEEN @date_begin AND @date_end;

        SET @cur_count = @@ROWCOUNT;
    end

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • But lets say we were in Oct18 and now Nov18, how would the code know that the month changed, and would know to change from Oct18? I would think that needs to be dynamic, and this doesn't look like that, or I'm not getting it

  • quinn.jay - Monday, October 1, 2018 1:54 PM

    But lets say we were in Oct18 and now Nov18, how would the code know that the month changed, and would know to change from Oct18? I would think that needs to be dynamic, and this doesn't look like that, or I'm not getting it

    Something very simple like this perhaps?:

    delete from [database].[dbo].[TableName]where [actvt_ym] = FORMAT(GETDATE(),'yyyyMM')*1

  • You don't really need a row number - nor do you need a break...

    Declare @rowsAffected int = 0
      , @batchSize int = 200000   -- adjust for your system
      , @activityYearMonth int = cast(convert(char(6), getdate(), 112) As int);

    While @rowsAffected > 0
    Begin

    Delete Top(@batchSize)
     From [database].[dbo].[TableName]
    Where [actvt_ym] = @activityYearMonth;

      Set @rowsAffected = @@rowcount;
      
       Checkpoint;  -- Or - issue a tlog backup here if database is full recovery
      End;

    In your original code - you used between and 2 periods.  Because BETWEEN is inclusive that delete would remove rows for both periods and I assume you only want it to remove data for the current period.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams 3188 - Monday, October 1, 2018 2:08 PM

    You don't really need a row number - nor do you need a break...

    Declare @rowsAffected int = 0
      , @batchSize int = 200000   -- adjust for your system
      , @activityYearMonth int = cast(convert(char(6), getdate(), 112) As int);

    While @rowsAffected > 0
    Begin

    Delete Top(@batchSize)
     From [database].[dbo].[TableName]
    Where [actvt_ym] = @activityYearMonth;

      Set @rowsAffected = @@rowcount;
      
       Checkpoint;  -- Or - issue a tlog backup here if database is full recovery
      End;

    In your original code - you used between and 2 periods.  Because BETWEEN is inclusive that delete would remove rows for both periods and I assume you only want it to remove data for the current period.

    Check the @rowsAffected initial setting - the loop never executes.

  • quinn.jay - Monday, October 1, 2018 1:55 PM

    quinn.jay - Monday, October 1, 2018 1:54 PM

    But lets say we were in Oct18 and now Nov18, how would the code know that the month changed, and would know to change from Oct18? I would think that needs to be dynamic, and this doesn't look like that, or I'm not getting it

    Something very simple like this perhaps?:

    delete from [database].[dbo].[TableName]where [actvt_ym] = FORMAT(GETDATE(),'yyyyMM')*1

    If all that you want is 'current' year/month in YYYYMM format (as an INT), you can use this:

    YEAR(GETDATE()) * 100 + MONTH(GETDATE())

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • gvoshol 73146 - Tuesday, October 2, 2018 5:09 AM

    Jeffrey Williams 3188 - Monday, October 1, 2018 2:08 PM

    You don't really need a row number - nor do you need a break...

    Declare @rowsAffected int = 0
      , @batchSize int = 200000   -- adjust for your system
      , @activityYearMonth int = cast(convert(char(6), getdate(), 112) As int);

    While @rowsAffected > 0
    Begin

    Delete Top(@batchSize)
     From [database].[dbo].[TableName]
    Where [actvt_ym] = @activityYearMonth;

      Set @rowsAffected = @@rowcount;
      
       Checkpoint;  -- Or - issue a tlog backup here if database is full recovery
      End;

    In your original code - you used between and 2 periods.  Because BETWEEN is inclusive that delete would remove rows for both periods and I assume you only want it to remove data for the current period.

    Check the @rowsAffected initial setting - the loop never executes.

    Typo, it should be initialized to 1...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • drew.allen - Monday, October 1, 2018 1:51 PM

    sgmunson - Monday, October 1, 2018 1:18 PM

    The delete you've coded does ALL the needed deletes in one fell swoop, and having a loop isn't going to help with that DELETE.   You do need a row number equivalent to help you.  Something like this:
    WHILE 1 = 1
        BEGIN;
        WITH RowNums AS (

            SELECT TOP 1 PERCENT
                TN.PrimaryKeyColumnName
            FROM [database].[dbo].[TableName] AS TN
            WHERE    TN.actvt_ym >= 201809
                AND TN.actvt_ym < 201810
            ORDER BY TN.PrimaryKeyColumnName
        )
        DELETE TN
        FROM RowNums AS RN
        INNER JOIN [database].[dbo].[TableName] AS TN
            ON RN.PrimaryKeyColumnName = TN.PrimaryKeyColumnName;
        
        IF @@ROWCOUNT = 0
            BEGIN
            BREAK;
            END;
        END;

    You can adjust the percentage up or down based on your knowledge of the total row count for that month.

    I would not use PERCENT in a loop where you are deleting records,because the number of records you process each loop will decrease exponentially.  You want to process approximately the same number of records each loop, so you should just use a set value.

    I also see no reason to use a ROW_NUMBER (which requires an expensive sort).  The main reason for using a ROW_NUMBER() here would be to help track which records have already been processed, but you don't need to track them, because they've already been deleted and are no longer available to be processed on subsequent loops.

    There also no reason to do the self join.  It just adds unnecessary overhead and doesn't gain you anything.

    DECLARE
        @cur_count INTEGER,
        @batch_size INTEGER = 50000,
        @date_begin INTEGER = CONVERT(CHAR(6), DATEADD(MONTH, -1, GETDATE()), 112),
        @date_end INTEGER = CONVERT(CHAR(6), GETDATE(), 112);

    SET @cur_count = @batch_size

    WHILE @cur_count = @batch_size
    BEGIN
        DELETE TOP (@batch_size)
        FROM [database].[dbo].[TableName]
        WHERE [actvt_ym] BETWEEN @date_begin AND @date_end;

        SET @cur_count = @@ROWCOUNT;
    end

    Drew

    Oops!   Kinda forgot about the impact of what using a percentage would do.  Here's my corrected code, based on yours, that addresses the date issue:
    DECLARE
        @RowCount    AS int    = 1,
      @BatchSize    AS int    = 50000,
        @Today        AS date    = '20180901';
    DECLARE
      @StartMonth    AS int    = CONVERT(char(6), DATEFROMPARTS(YEAR(@Today), MONTH(@Today), 1), 112),
      @EndMonth    AS int    = CONVERT(char(6), DATEADD(month, 1, @Today), 112);

    WHILE @RowCount > 0
        BEGIN
      DELETE TOP (@BatchSize)
      FROM [database].[schema].[TableName] AS TN
      WHERE    TN.actvt_ym >= @StartMonth
            AND TN.actvt_ym < @EndMonth;

      SET @RowCount = @@ROWCOUNT;
        END;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply