Table Delete

  • I have a very large table and I'm looking to do a cleanup based upon a non-keyed field. This table many concurrent insert transactions running against it, and I can't afford downtime to create an Index this field. Looking for some suggestions using the PK(ID) to help speed up the deletes and minimize Locking on the table.

    Right now driver for cleanup is something like this:

    delete from Image_Classification_Master

    where

    convert(date,SpoolStartDt) = '02/22/2022'

    Thanks.

    CREATE TABLE [dbo].[Image_Classification_Master](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [ImageName] [nvarchar](500) NULL,
    [ImageType] [nvarchar](200) NULL,
    [ImageSource] [image] NULL,
    [ReceivedDateTime] [datetime] NULL,
    [ImagePath] [nvarchar](500) NULL,
    [Site] [varchar](3) NULL,
    [MachineNbr] [int] NULL,
    [LineNbr] [int] NULL,
    [TakeUpNbr] [int] NULL,
    [SpoolNbr] [int] NULL,
    [ImageIndex] [int] NULL,
    [CameraNbr] [varchar](2) NULL,
    [SpoolStartDt] [datetime] NULL,
    [SpoolStartTime] [time](7) NULL,
    [DefectDate] [datetime] NULL,
    [DefectTime] [time](7) NULL,
    [DefectNbr] [int] NULL,
    [DefectClass] [varchar](100) NULL,
    [Reviewer] [nvarchar](50) NULL,
    [UserDefectInput] [nvarchar](10) NULL,
    CONSTRAINT [PK_Image_Classification_Master] PRIMARY KEY CLUSTERED
    (
    [Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO

    ALTER TABLE [dbo].[Image_Classification_Master] ADD CONSTRAINT [DF_Image_Classification_Master_ReceivedDateTime] DEFAULT (getdate()) FOR [ReceivedDateTime]
    GO

     

  • Oof. Even if you put an index on the SpoolSrtDt column, because you have a function to convert the value you're going to get scans anyway. Why do that? It's a datetime column. Compare it to a datetime value. Then, an index could help. Otherwise, you're just looking at scans and no way around that. Because the clustered index is ID, that's what must be used to delete values. No getting around it. So, they're either found through a scan of the clustered index, or, you build an index on the appropriate column (which will absolutely have some affect on the system, unless you're running Enterprise, then you can do an online index creation) and pay that cost so it doesn't have to do the table scan. No magic way around this really.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • @ Bruin,

    Which edition of 2016 do you have?  Standard or Enterprise?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    @ Bruin,

    Which edition of 2016 do you have?  Standard or Enterprise?

    Also, are there any FKs that are pointing AT this table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IF you're on Enterprise Edition, you can do an online build of an index for SpoolStartDt.  Then you can use that index to do the DELETEs.  If you're not on Enterprise, the index build would lock up the table and you should not try to create the index offline.

    Also, you don't have to do a whole day at a time, if the query plan shows a scan or a full day takes too long to process, with all the other activity on the table.  For example, here's an approach that would delete 4 hours' worth of old activity at a time.

    CREATE UNIQUE NONCLUSTERED INDEX [IX_Image_Classification_Master] ON dbo.Image_Classification_Master ( SpoolStartDt, Id )

    WITH ( DATA_COMPRESSION = ROW, FILLFACTOR = 95, ONLINE = ON, SORT_IN_TEMPDB = ON );

    DELETE FROM dbo.Image_Classification_Master

    WHERE SpoolStartDt >= '20220222 00:00' AND SpoolStartDt < '20220222 04:00'

    DELETE FROM dbo.Image_Classification_Master

    WHERE SpoolStartDt >= '20220222 04:00' AND SpoolStartDt < '20220222 08:00'

    DELETE FROM dbo.Image_Classification_Master

    WHERE SpoolStartDt >= '20220222 08:00' AND SpoolStartDt < '20220222 12:00'

    DELETE FROM dbo.Image_Classification_Master

    WHERE SpoolStartDt >= '20220222 12:00' AND SpoolStartDt < '20220222 16:00'

    DELETE FROM dbo.Image_Classification_Master

    WHERE SpoolStartDt >= '20220222 16:00' AND SpoolStartDt < '20220222 20:00'

    DELETE FROM dbo.Image_Classification_Master

    WHERE SpoolStartDt >= '20220222 20:00' AND SpoolStartDt < '20220223 00:00'

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • When performing deletions on large tables, it is essential to delete rows using the clustered index (in this case, PK(Id)) and process the deletions in small batches. This approach ensures:

    Efficiency: The clustered index provides a natural order for the rows in the table, making lookups and deletions faster. Deleting rows directly on the clustered index leverages this ordering, avoiding costly table scans and reducing overall I/O operations.

    Minimised Locking: Deleting rows in small batches (e.g., 500 or 1000 rows at a time) ensures that locks on the table are held for shorter durations. This reduces the risk of blocking concurrent transactions that insert or update rows in the same table.

    Reduced Transaction Log Impact: Large deletions in a single transaction can overwhelm the transaction log, especially if the table contains millions of rows. Batch processing limits the size of each transaction, keeping the transaction log manageable and ensuring that log backups and recovery operations remain efficient.

    System Performance: Small batches prevent long-running transactions, which can degrade system performance and increase the likelihood of deadlocks. By committing each batch independently, the system can continue to process other workloads smoothly.

    Always delete rows using the PK(Id) or another clustered index to optimise performance.

    Limit the batch size to 500-2000 rows depending on the system's workload and the size of the table.

    Monitor the process to ensure that the deletion batches are completing within acceptable time limits and adjust the batch size if necessary.

    -- Declare batch size
    DECLARE @BatchSize INT = 1000;
    DECLARE @RowsAffected INT = 1;

    -- Temporary table to hold IDs of rows to be deleted
    DROP TABLE IF EXISTS #ToDelete;
    CREATE TABLE #ToDelete (Id INT PRIMARY KEY CLUSTERED);

    -- Temporary table to hold batch IDs
    DROP TABLE IF EXISTS #BatchToDelete;
    CREATE TABLE #BatchToDelete (Id INT PRIMARY KEY CLUSTERED);

    -- Step 1: Insert IDs of rows to be deleted into the temporary table
    INSERT INTO #ToDelete (Id)
    SELECT Id
    FROM dbo.Image_Classification_Master WITH(NOLOCK)
    WHERE CONVERT(DATE, SpoolStartDt) = '2022-02-22'
    OPTION (MAXDOP 1);

    -- Step 2: Loop to delete rows in batches
    WHILE @RowsAffected > 0 BEGIN
    -- Begin transaction to ensure consistency
    BEGIN TRANSACTION;

    -- Populate the batch table with the next set of IDs to delete
    ;WITH cte AS
    (
    SELECT TOP (@BatchSize) ID
    FROM #ToDelete
    )
    DELETE FROM cte
    OUTPUT deleted.ID
    INTO #BatchToDelete(ID);

    -- Perform the deletion in the main table using EXISTS
    DELETE dbo.Image_Classification_Master
    WHERE EXISTS (SELECT 1
    FROM #BatchToDelete b
    WHERE b.Id = Image_Classification_Master.Id);

    -- Get the number of rows affected in this batch
    SET @RowsAffected = @@ROWCOUNT;
    -- Clear the batch table
    DELETE FROM #BatchToDelete;
    -- Commit transaction
    COMMIT TRANSACTION;
    -- Optional: wait for delay (100ms) to give other processes using the system a chance
    WAITFOR DELAY '00:00:00:100'
    -- Optional: Print progress
    RAISERROR('Deleted %d rows in this batch.', 0, 1, @RowsAffected) WITH NOWAIT;
    END

    -- Cleanup
    DROP TABLE IF EXISTS #ToDelete, #BatchToDelete;
    PRINT 'Deletion completed.';

     

     

     

    • This reply was modified 1 month, 1 week ago by  Jonathan AC Roberts. Reason: Correct delete from batch, and added NOLOCK to ID select
  • Looks like I need both of these fields for startdt and starttime

    [SpoolStartDt] [datetime] NULL,

    [SpoolStartTime] [time](7) NULL,

    example:

    2022-02-23 00:00:00.000

    16:41:00.0000000

  • JoNathan can your example include 2 fields I mention above? so I could use a 4hr window as also suggested.

    Thanks.

    • This reply was modified 1 month, 1 week ago by  Bruin.
  • Bruin wrote:

    Jonathan can your example include 2 fields I mention above? so I could use a 4hr window as also suggested.

    Thanks.

    Certainly! Simply include the criteria in the WHERE clause of the SELECT statement that retrieves the IDs.

    INSERT INTO #ToDelete (Id)
    SELECT Id
    FROM dbo.Image_Classification_Master
    WHERE CONVERT(DATE, SpoolStartDt) = '2022-02-22';
    AND SpoolStartTime BETWEEN '16:00:00.0000000' AND '20:00:00.0000000';

    How many rows do you need to delete?

  • This is throwing an error:

    -- Populate the batch table with the next set of IDs to delete

    DELETE TOP (@BatchSize)

    OUTPUT DELETED.Id INTO #BatchToDelete

    FROM #ToDelete;

     

    Msg 102, Level 15, State 1, Line 27

    Incorrect syntax near 'DELETED'.

  • Bruin wrote:

    This is throwing an error:

    -- Populate the batch table with the next set of IDs to delete DELETE TOP (@BatchSize) OUTPUT DELETED.Id INTO #BatchToDelete FROM #ToDelete;

    Msg 102, Level 15, State 1, Line 27 Incorrect syntax near 'DELETED'.

    Use this instead:

    ;WITH cte AS
    (
    SELECT TOP (@BatchSize) ID
          FROM #ToDelete
    )
    DELETE
    FROM cte
    OUTPUT deleted.ID
    INTO #BatchToDelete(ID);
  • Perfect  .. works great ..

    Many Thanks...

  • Bruin wrote:

    Perfect  .. works great ..

    Many Thanks...

    I’ve revised my original response to correct the delete statement - perhaps it could serve as an example.

    I’ve also added WITH (NOLOCK) on the ID selection to make the operation even less intrusive on the production environment.

    Using this method, you could process a day at a time. Be mindful of the batch size to ensure that each batch runs efficiently, ideally completing within 1 to 3 seconds. To further minimise any potential impact on online users, you might consider adding a small WAITFOR DELAY between batches, such as:

    WAITFOR DELAY '00:00:00:100'

    This introduces a 100 ms pause between batches, giving online application users more of a chance to work without being affected by the archiving process.

    Jeff raised a good point about foreign keys on other tables pointing to the one you’re deleting from. It’s important to check the behaviour configured for the foreign keys:

    • CASCADE: Automatically deletes child rows.
    • SET NULL or SET DEFAULT: Updates the foreign key column in the child table to NULL or a default value.
    • NO ACTION or RESTRICT: Prevents the operation if it would violate the constraint, raising an error.

    Additionally, ensure appropriate indexes exist on the foreign key columns to optimise performance and avoid issues during deletion.

  • Thanks again this table is standalone ..

  • How many rows are added to the table each day? And how many rows are currently in the table?

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply