Ways to improve record deletion speed

Question

Ways to improve record deletion speed

DamianC

SSCrazy Eights

Points: 8023
More actions
October 13, 2015 at 3:10 am

#312201

Hello
I have a table (F_POLICY_TRANSACTION)
This table has a couple of million rows in it
I am using a column named POLICY_TRANSACTION_BKEY to select records to delete (approximately 750k using the code below)
This column has a non-clustered index applied
This is the code I have used:
WHILE 1 = 1
BEGIN
DELETE TOP(50000)
FROM F_POLICY_TRANSACTION with (tablockx)
WHERE [POLICY_TRANSACTION_BKEY] like 'SIC%';
IF @@ROWCOUNT < 50000 BREAK;
END
Problem is, it takes around 10 minutes to run
Is there any way it can be made more efficient?
I have tried varying the rowcount with no success
Thanks
Damian.
- Damian

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 1

When I do this (in batches) I wonder if the selection criteria (i.e. "like 'SIC%'" in your case) takes time on each iteration of the loop.

I get all the clustered index keys into a #temporary table, ordered by those keys, with an IDENTTIY column, and then delete in batches based on ranges of the ID. Something like this:

SELECT IDENTITY(int, 1, 1) AS MyID, ClustKey1, ClustKey2, ...

INTO #TEMP

FROM F_POLICY_TRANSACTION

WHERE [POLICY_TRANSACTION_BKEY] like 'SIC%'

ORDER BY ClustKey1, ClustKey2, ...

DECLARE@intLoop int = 1

WHILE 1 = 1

BEGIN

DELETE D

FROM #TEMP AS T

JOIN F_POLICY_TRANSACTION AS D

ON D.ClustKey1 = T.ClustKey1

AND D.ClustKey2 = T.ClustKey2

AND ...

WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

IF @@ROWCOUNT < 50000 BREAK;

SELECT @intLoop = @intLoop + 50000

END

I'd be interested to hear if that is any more efficient for you.

DamianC SSCrazy Eights Points: 8023 More actions · Answer 2

Thanks

That's runs through a lot quicker - less than a minute!

Interested to know why though

Why would the introduction of a temp table (with a join) for comparison run though faster than a straight delete?

Is it the fact that I am using an int based where in the delete?

- Damian

Bill Talada SSChampion Points: 11954 More actions · Answer 3

If you only have a few million rows and you are deleting about 30% of them, then likely the optimizer will use a table scan. If you delete 50k batches, that means 15 table scans. Statistics will likely not be updated automatically between the 15 batches. Also, if your table is a heap and your non-clustered index does not include the PK then a RID lookup will need to be done making the NC index less likely to be used.

I notice "Old Hands" solution selects by clustered index and also deletes by clustered index. That way whole pages get deleted in a single delete statement. The alternative would be to delete a row from a page with each loop causing much more overhead revisiting pages until a page finally gets freed.

ScottPletcher SSC Guru Points: 101238 More actions · Answer 4

For absolute max speed, you should also cluster the temp table on ID:

SELECT TOP (0) IDENTITY(int, 1, 1) AS ID, ClustKey1, ClustKey2 --, ...

INTO #TEMP

FROM F_POLICY_TRANSACTION

CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

INSERT INTO #TEMP

SELECT ClustKey1, ClustKey2 --, ...

FROM F_POLICY_TRANSACTION

WHERE [POLICY_TRANSACTION_BKEY] LIKE 'SIC%'

ORDER BY ClustKey1, ClustKey2 --, ...

--...rest_of_code_as_before...

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Eric M Russell SSC Guru Points: 125624 More actions · Answer 5

So, you have a table with ~2,000k rows, you're deleting ~750k rows in batches of 50k, it's taking ~10 minutes, and you want to minimize the runtime duration. I'm guessing the biggest performance hit will be the I/O required by on the fly page reorganization and transaction logging. The problem is that deletes are the most expensive type of operation in that regard. Also, using the batch delete method, you're left with a fragmented table that could be the same size, or maybe even larger, than the original.

Maybe I'm wrong, we never know for sure until we experiment, but I suspect that selecting the rows you need into another table would require only 10 seconds or so, because selecting into a non-indexed heap table is a minimally logged operation. Once done, drop the original table, rename the temp table, and then re-create indexes (remember to add clustered index first, then non-clustered), which might take another 10 or 20 seconds. Another benefit of the select into method is that once done your table will be logically sorted with no page or index fragmentation.

SELECT *

INTO F_POLICY_TRANSACTION_TEMP

FROM F_POLICY_TRANSACTION (tablock)

WHERE POLICY_TRANSACTION_BKEY like 'SIC%';

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 6

ScottPletcher (10/13/2015)
For absolute max speed, you should also cluster the temp table on ID

Thanks Scott. I do in fact do that, declaring the ID as a PRIMARY KEY.

CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

I think it will benefit from being declared as a UNIQUE index?

DamianC SSCrazy Eights Points: 8023 More actions · Answer 7

ok, so I am very nearly there using this:

SELECT TOP (0) IDENTITY(int,1 ,1) as ID, POLICY_TRANSACTION_BKEY

INTO #TEMP

FROM [F_POLICY_TRANSACTION]

-- Create a clustered index for efficiency

CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

INSERT INTO #TEMP

SELECT POLICY_TRANSACTION_BKEY

FROM [F_POLICY_TRANSACTION]

WHERE POLICY_TRANSACTION_BKEY like = 'SIC%'

ORDER BY POLICY_TRANSACTION_BKEY

DECLARE@intLoop int = 1

WHILE 1 = 1

BEGIN

DELETE D

FROM #TEMP AS T

JOIN [F_POLICY_TRANSACTION] AS D

ON D.POLICY_TRANSACTION_BKEY = T.POLICY_TRANSACTION_BKEY

WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

IF @@ROWCOUNT < 50000 BREAK;

SELECT @intLoop = @intLoop + 50000

END

go

I runs through pretty quickly (less than a minute)

Now spotted an issue - it only deletes half the rows

Can anybody spot an error

Possibly an issue with the loop counter but can't quite see it

Thanks

- Damian

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 8

DamianC (10/14/2015)
Can anybody spot an error

Nope! :crying:

I'd suggest trying this to help with debugging:

SELECT TOP (0) IDENTITY(int,1 ,1) as ID, POLICY_TRANSACTION_BKEY

INTO #TEMP

FROM [F_POLICY_TRANSACTION]

-- Create a clustered index for efficiency

CREATE [highlight="#ffff11"]UNIQUE[/highlight] CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

[highlight="#ffff11"]DECLARE@intLoop int = 1,

@intRowCount int,

@intRowCountTotal int = 0[/highlight]

INSERT INTO #TEMP

SELECT POLICY_TRANSACTION_BKEY

FROM [F_POLICY_TRANSACTION]

WHERE POLICY_TRANSACTION_BKEY like = 'SIC%'

ORDER BY POLICY_TRANSACTION_BKEY

[highlight="#ffff11"]SELECT @intRowCount = @@ROWCOUNT

RAISERROR (N'INSERT INTO Rows %d', 10, 1, @intRowCount) WITH NOWAIT[/highlight]

WHILE 1 = 1

BEGIN

DELETE D

FROM #TEMP AS T

JOIN [F_POLICY_TRANSACTION] AS D

ON D.POLICY_TRANSACTION_BKEY = T.POLICY_TRANSACTION_BKEY

WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

[highlight="#ffff11"]SELECT @intRowCount = @@ROWCOUNT,

@intRowCountTotal = @intRowCountTotal + @@ROWCOUNT,

@intLoop = @intLoop + 50000

RAISERROR (N'LOOP %d, Rows %d', 10, 1, @intLoop, @intRowCount) WITH NOWAIT

IF @intRowCount < 50000 BREAK;[/highlight]

END

[highlight="#ffff11"]RAISERROR (N'TOTAL ROWS %d', 10, 1, @intRowCountTotal) WITH NOWAIT[/highlight]

go

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 9

P.S. I presume the Clustered Index Key (i.e. on POLICY_TRANSACTION_BKEY) in your table [F_POLICY_TRANSACTION] is unique / PKey?

DamianC SSCrazy Eights Points: 8023 More actions · Answer 10

Ah, looks like POLICY_TRANSACTION_BKEY wasn't unique

This table now has an identity surrogate key on it

Using that worked perfectly

Thanks

- Damian

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 11

Might be worth adding your ID to make the Clustered Index unique (and then explicitly declaring the Clustered Index as UNIQUE) ... otherwise SQL will be adding a uniquifier value instead.

Mike Good SSCertifiable Points: 7415 More actions · Answer 12

FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 13

Mike Good (10/16/2015)
FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.

In the presence of a large number of rows to populate the Temp Table with, you might be surprised at how fast SELECT INTO followed by the creation of the clustered index is even in the FULL Recovery Model.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101238 More actions · Answer 14

Jeff Moden (11/1/2015)
Mike Good (10/16/2015)
FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.
In the presence of a large number of rows to populate the Temp Table with, you might be surprised at how fast SELECT INTO followed by the creation of the clustered index is even in the FULL Recovery Model.

I believe you can create the clustered index before loading the table, and still get the great benefits of minimal logging that SELECT INTO provides, if you code the INSERT as required, i.e., you use a TABLOCK hint on the destination table:

INSERT INTO table_name ( ... ) WITH (TABLOCK) SELECT ... FROM ...

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".