Ways to improve record deletion speed

  • Hello

    I have a table (F_POLICY_TRANSACTION)

    This table has a couple of million rows in it

    I am using a column named POLICY_TRANSACTION_BKEY to select records to delete (approximately 750k using the code below)

    This column has a non-clustered index applied

    This is the code I have used:

    WHILE 1 = 1

    BEGIN

    DELETE TOP(50000)

    FROM F_POLICY_TRANSACTION with (tablockx)

    WHERE [POLICY_TRANSACTION_BKEY] like 'SIC%';

    IF @@ROWCOUNT < 50000 BREAK;

    END

    Problem is, it takes around 10 minutes to run

    Is there any way it can be made more efficient?

    I have tried varying the rowcount with no success

    Thanks

    Damian.

    - Damian

  • When I do this (in batches) I wonder if the selection criteria (i.e. "like 'SIC%'" in your case) takes time on each iteration of the loop.

    I get all the clustered index keys into a #temporary table, ordered by those keys, with an IDENTTIY column, and then delete in batches based on ranges of the ID. Something like this:

    SELECT IDENTITY(int, 1, 1) AS MyID, ClustKey1, ClustKey2, ...

    INTO #TEMP

    FROM F_POLICY_TRANSACTION

    WHERE [POLICY_TRANSACTION_BKEY] like 'SIC%'

    ORDER BY ClustKey1, ClustKey2, ...

    DECLARE@intLoop int = 1

    WHILE 1 = 1

    BEGIN

    DELETE D

    FROM #TEMP AS T

    JOIN F_POLICY_TRANSACTION AS D

    ON D.ClustKey1 = T.ClustKey1

    AND D.ClustKey2 = T.ClustKey2

    AND ...

    WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

    IF @@ROWCOUNT < 50000 BREAK;

    SELECT @intLoop = @intLoop + 50000

    END

    I'd be interested to hear if that is any more efficient for you.

  • Thanks

    That's runs through a lot quicker - less than a minute!

    Interested to know why though

    Why would the introduction of a temp table (with a join) for comparison run though faster than a straight delete?

    Is it the fact that I am using an int based where in the delete?

    - Damian

  • If you only have a few million rows and you are deleting about 30% of them, then likely the optimizer will use a table scan. If you delete 50k batches, that means 15 table scans. Statistics will likely not be updated automatically between the 15 batches. Also, if your table is a heap and your non-clustered index does not include the PK then a RID lookup will need to be done making the NC index less likely to be used.

    I notice "Old Hands" solution selects by clustered index and also deletes by clustered index. That way whole pages get deleted in a single delete statement. The alternative would be to delete a row from a page with each loop causing much more overhead revisiting pages until a page finally gets freed.

  • For absolute max speed, you should also cluster the temp table on ID:

    SELECT TOP (0) IDENTITY(int, 1, 1) AS ID, ClustKey1, ClustKey2 --, ...

    INTO #TEMP

    FROM F_POLICY_TRANSACTION

    CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

    INSERT INTO #TEMP

    SELECT ClustKey1, ClustKey2 --, ...

    FROM F_POLICY_TRANSACTION

    WHERE [POLICY_TRANSACTION_BKEY] LIKE 'SIC%'

    ORDER BY ClustKey1, ClustKey2 --, ...

    --...rest_of_code_as_before...

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • So, you have a table with ~2,000k rows, you're deleting ~750k rows in batches of 50k, it's taking ~10 minutes, and you want to minimize the runtime duration. I'm guessing the biggest performance hit will be the I/O required by on the fly page reorganization and transaction logging. The problem is that deletes are the most expensive type of operation in that regard. Also, using the batch delete method, you're left with a fragmented table that could be the same size, or maybe even larger, than the original.

    Maybe I'm wrong, we never know for sure until we experiment, but I suspect that selecting the rows you need into another table would require only 10 seconds or so, because selecting into a non-indexed heap table is a minimally logged operation. Once done, drop the original table, rename the temp table, and then re-create indexes (remember to add clustered index first, then non-clustered), which might take another 10 or 20 seconds. Another benefit of the select into method is that once done your table will be logically sorted with no page or index fragmentation.

    SELECT *

    INTO F_POLICY_TRANSACTION_TEMP

    FROM F_POLICY_TRANSACTION (tablock)

    WHERE POLICY_TRANSACTION_BKEY like 'SIC%';

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • ScottPletcher (10/13/2015)


    For absolute max speed, you should also cluster the temp table on ID

    Thanks Scott. I do in fact do that, declaring the ID as a PRIMARY KEY.

    CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

    I think it will benefit from being declared as a UNIQUE index?

  • ok, so I am very nearly there using this:

    SELECT TOP (0) IDENTITY(int,1 ,1) as ID, POLICY_TRANSACTION_BKEY

    INTO #TEMP

    FROM [F_POLICY_TRANSACTION]

    -- Create a clustered index for efficiency

    CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

    INSERT INTO #TEMP

    SELECT POLICY_TRANSACTION_BKEY

    FROM [F_POLICY_TRANSACTION]

    WHERE POLICY_TRANSACTION_BKEY like = 'SIC%'

    ORDER BY POLICY_TRANSACTION_BKEY

    DECLARE@intLoop int = 1

    WHILE 1 = 1

    BEGIN

    DELETE D

    FROM #TEMP AS T

    JOIN [F_POLICY_TRANSACTION] AS D

    ON D.POLICY_TRANSACTION_BKEY = T.POLICY_TRANSACTION_BKEY

    WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

    IF @@ROWCOUNT < 50000 BREAK;

    SELECT @intLoop = @intLoop + 50000

    END

    go

    I runs through pretty quickly (less than a minute)

    Now spotted an issue - it only deletes half the rows

    Can anybody spot an error

    Possibly an issue with the loop counter but can't quite see it

    Thanks

    - Damian

  • DamianC (10/14/2015)


    Can anybody spot an error

    Nope! :crying:

    I'd suggest trying this to help with debugging:

    SELECT TOP (0) IDENTITY(int,1 ,1) as ID, POLICY_TRANSACTION_BKEY

    INTO #TEMP

    FROM [F_POLICY_TRANSACTION]

    -- Create a clustered index for efficiency

    CREATE [highlight="#ffff11"]UNIQUE[/highlight] CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

    [highlight="#ffff11"]DECLARE@intLoop int = 1,

    @intRowCount int,

    @intRowCountTotal int = 0[/highlight]

    INSERT INTO #TEMP

    SELECT POLICY_TRANSACTION_BKEY

    FROM [F_POLICY_TRANSACTION]

    WHERE POLICY_TRANSACTION_BKEY like = 'SIC%'

    ORDER BY POLICY_TRANSACTION_BKEY

    [highlight="#ffff11"]SELECT @intRowCount = @@ROWCOUNT

    RAISERROR (N'INSERT INTO Rows %d', 10, 1, @intRowCount) WITH NOWAIT[/highlight]

    WHILE 1 = 1

    BEGIN

    DELETE D

    FROM #TEMP AS T

    JOIN [F_POLICY_TRANSACTION] AS D

    ON D.POLICY_TRANSACTION_BKEY = T.POLICY_TRANSACTION_BKEY

    WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

    [highlight="#ffff11"]SELECT @intRowCount = @@ROWCOUNT,

    @intRowCountTotal = @intRowCountTotal + @@ROWCOUNT,

    @intLoop = @intLoop + 50000

    RAISERROR (N'LOOP %d, Rows %d', 10, 1, @intLoop, @intRowCount) WITH NOWAIT

    IF @intRowCount < 50000 BREAK;[/highlight]

    END

    [highlight="#ffff11"]RAISERROR (N'TOTAL ROWS %d', 10, 1, @intRowCountTotal) WITH NOWAIT[/highlight]

    go

  • P.S. I presume the Clustered Index Key (i.e. on POLICY_TRANSACTION_BKEY) in your table [F_POLICY_TRANSACTION] is unique / PKey?

  • Ah, looks like POLICY_TRANSACTION_BKEY wasn't unique

    This table now has an identity surrogate key on it

    Using that worked perfectly

    Thanks

    - Damian

  • Might be worth adding your ID to make the Clustered Index unique (and then explicitly declaring the Clustered Index as UNIQUE) ... otherwise SQL will be adding a uniquifier value instead.

  • FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.

  • Mike Good (10/16/2015)


    FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.

    In the presence of a large number of rows to populate the Temp Table with, you might be surprised at how fast SELECT INTO followed by the creation of the clustered index is even in the FULL Recovery Model.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/1/2015)


    Mike Good (10/16/2015)


    FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.

    In the presence of a large number of rows to populate the Temp Table with, you might be surprised at how fast SELECT INTO followed by the creation of the clustered index is even in the FULL Recovery Model.

    I believe you can create the clustered index before loading the table, and still get the great benefits of minimal logging that SELECT INTO provides, if you code the INSERT as required, i.e., you use a TABLOCK hint on the destination table:

    INSERT INTO table_name ( ... ) WITH (TABLOCK) SELECT ... FROM ...

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply