October 13, 2015 at 3:10 am
Hello
I have a table (F_POLICY_TRANSACTION)
This table has a couple of million rows in it
I am using a column named POLICY_TRANSACTION_BKEY to select records to delete (approximately 750k using the code below)
This column has a non-clustered index applied
This is the code I have used:
WHILE 1 = 1
BEGIN
DELETE TOP(50000)
FROM F_POLICY_TRANSACTION with (tablockx)
WHERE [POLICY_TRANSACTION_BKEY] like 'SIC%';
IF @@ROWCOUNT < 50000 BREAK;
END
Problem is, it takes around 10 minutes to run
Is there any way it can be made more efficient?
I have tried varying the rowcount with no success
Thanks
Damian.
- Damian
October 13, 2015 at 3:24 am
When I do this (in batches) I wonder if the selection criteria (i.e. "like 'SIC%'" in your case) takes time on each iteration of the loop.
I get all the clustered index keys into a #temporary table, ordered by those keys, with an IDENTTIY column, and then delete in batches based on ranges of the ID. Something like this:
SELECT IDENTITY(int, 1, 1) AS MyID, ClustKey1, ClustKey2, ...
INTO #TEMP
FROM F_POLICY_TRANSACTION
WHERE [POLICY_TRANSACTION_BKEY] like 'SIC%'
ORDER BY ClustKey1, ClustKey2, ...
DECLARE@intLoop int = 1
WHILE 1 = 1
BEGIN
DELETE D
FROM #TEMP AS T
JOIN F_POLICY_TRANSACTION AS D
ON D.ClustKey1 = T.ClustKey1
AND D.ClustKey2 = T.ClustKey2
AND ...
WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000
IF @@ROWCOUNT < 50000 BREAK;
SELECT @intLoop = @intLoop + 50000
END
I'd be interested to hear if that is any more efficient for you.
October 13, 2015 at 9:33 am
Thanks
That's runs through a lot quicker - less than a minute!
Interested to know why though
Why would the introduction of a temp table (with a join) for comparison run though faster than a straight delete?
Is it the fact that I am using an int based where in the delete?
- Damian
October 13, 2015 at 9:55 am
If you only have a few million rows and you are deleting about 30% of them, then likely the optimizer will use a table scan. If you delete 50k batches, that means 15 table scans. Statistics will likely not be updated automatically between the 15 batches. Also, if your table is a heap and your non-clustered index does not include the PK then a RID lookup will need to be done making the NC index less likely to be used.
I notice "Old Hands" solution selects by clustered index and also deletes by clustered index. That way whole pages get deleted in a single delete statement. The alternative would be to delete a row from a page with each loop causing much more overhead revisiting pages until a page finally gets freed.
October 13, 2015 at 10:02 am
For absolute max speed, you should also cluster the temp table on ID:
SELECT TOP (0) IDENTITY(int, 1, 1) AS ID, ClustKey1, ClustKey2 --, ...
INTO #TEMP
FROM F_POLICY_TRANSACTION
CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )
INSERT INTO #TEMP
SELECT ClustKey1, ClustKey2 --, ...
FROM F_POLICY_TRANSACTION
WHERE [POLICY_TRANSACTION_BKEY] LIKE 'SIC%'
ORDER BY ClustKey1, ClustKey2 --, ...
--...rest_of_code_as_before...
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 13, 2015 at 10:29 am
So, you have a table with ~2,000k rows, you're deleting ~750k rows in batches of 50k, it's taking ~10 minutes, and you want to minimize the runtime duration. I'm guessing the biggest performance hit will be the I/O required by on the fly page reorganization and transaction logging. The problem is that deletes are the most expensive type of operation in that regard. Also, using the batch delete method, you're left with a fragmented table that could be the same size, or maybe even larger, than the original.
Maybe I'm wrong, we never know for sure until we experiment, but I suspect that selecting the rows you need into another table would require only 10 seconds or so, because selecting into a non-indexed heap table is a minimally logged operation. Once done, drop the original table, rename the temp table, and then re-create indexes (remember to add clustered index first, then non-clustered), which might take another 10 or 20 seconds. Another benefit of the select into method is that once done your table will be logically sorted with no page or index fragmentation.
SELECT *
INTO F_POLICY_TRANSACTION_TEMP
FROM F_POLICY_TRANSACTION (tablock)
WHERE POLICY_TRANSACTION_BKEY like 'SIC%';
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
October 13, 2015 at 12:02 pm
ScottPletcher (10/13/2015)
For absolute max speed, you should also cluster the temp table on ID
Thanks Scott. I do in fact do that, declaring the ID as a PRIMARY KEY.
CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )
I think it will benefit from being declared as a UNIQUE index?
October 14, 2015 at 2:53 am
ok, so I am very nearly there using this:
SELECT TOP (0) IDENTITY(int,1 ,1) as ID, POLICY_TRANSACTION_BKEY
INTO #TEMP
FROM [F_POLICY_TRANSACTION]
-- Create a clustered index for efficiency
CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )
INSERT INTO #TEMP
SELECT POLICY_TRANSACTION_BKEY
FROM [F_POLICY_TRANSACTION]
WHERE POLICY_TRANSACTION_BKEY like = 'SIC%'
ORDER BY POLICY_TRANSACTION_BKEY
DECLARE@intLoop int = 1
WHILE 1 = 1
BEGIN
DELETE D
FROM #TEMP AS T
JOIN [F_POLICY_TRANSACTION] AS D
ON D.POLICY_TRANSACTION_BKEY = T.POLICY_TRANSACTION_BKEY
WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000
IF @@ROWCOUNT < 50000 BREAK;
SELECT @intLoop = @intLoop + 50000
END
go
I runs through pretty quickly (less than a minute)
Now spotted an issue - it only deletes half the rows
Can anybody spot an error
Possibly an issue with the loop counter but can't quite see it
Thanks
- Damian
October 14, 2015 at 3:57 am
DamianC (10/14/2015)
Can anybody spot an error
Nope! :crying:
I'd suggest trying this to help with debugging:
SELECT TOP (0) IDENTITY(int,1 ,1) as ID, POLICY_TRANSACTION_BKEY
INTO #TEMP
FROM [F_POLICY_TRANSACTION]
-- Create a clustered index for efficiency
CREATE [highlight="#ffff11"]UNIQUE[/highlight] CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )
[highlight="#ffff11"]DECLARE@intLoop int = 1,
@intRowCount int,
@intRowCountTotal int = 0[/highlight]
INSERT INTO #TEMP
SELECT POLICY_TRANSACTION_BKEY
FROM [F_POLICY_TRANSACTION]
WHERE POLICY_TRANSACTION_BKEY like = 'SIC%'
ORDER BY POLICY_TRANSACTION_BKEY
[highlight="#ffff11"]SELECT @intRowCount = @@ROWCOUNT
RAISERROR (N'INSERT INTO Rows %d', 10, 1, @intRowCount) WITH NOWAIT[/highlight]
WHILE 1 = 1
BEGIN
DELETE D
FROM #TEMP AS T
JOIN [F_POLICY_TRANSACTION] AS D
ON D.POLICY_TRANSACTION_BKEY = T.POLICY_TRANSACTION_BKEY
WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000
[highlight="#ffff11"]SELECT @intRowCount = @@ROWCOUNT,
@intRowCountTotal = @intRowCountTotal + @@ROWCOUNT,
@intLoop = @intLoop + 50000
RAISERROR (N'LOOP %d, Rows %d', 10, 1, @intLoop, @intRowCount) WITH NOWAIT
IF @intRowCount < 50000 BREAK;[/highlight]
END
[highlight="#ffff11"]RAISERROR (N'TOTAL ROWS %d', 10, 1, @intRowCountTotal) WITH NOWAIT[/highlight]
go
October 14, 2015 at 4:00 am
P.S. I presume the Clustered Index Key (i.e. on POLICY_TRANSACTION_BKEY) in your table [F_POLICY_TRANSACTION] is unique / PKey?
October 14, 2015 at 8:19 am
Ah, looks like POLICY_TRANSACTION_BKEY wasn't unique
This table now has an identity surrogate key on it
Using that worked perfectly
Thanks
- Damian
October 15, 2015 at 2:41 am
Might be worth adding your ID to make the Clustered Index unique (and then explicitly declaring the Clustered Index as UNIQUE) ... otherwise SQL will be adding a uniquifier value instead.
October 16, 2015 at 8:45 am
FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.
November 1, 2015 at 7:54 pm
Mike Good (10/16/2015)
FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.
In the presence of a large number of rows to populate the Temp Table with, you might be surprised at how fast SELECT INTO followed by the creation of the clustered index is even in the FULL Recovery Model.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 2, 2015 at 9:26 am
Jeff Moden (11/1/2015)
Mike Good (10/16/2015)
FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.In the presence of a large number of rows to populate the Temp Table with, you might be surprised at how fast SELECT INTO followed by the creation of the clustered index is even in the FULL Recovery Model.
I believe you can create the clustered index before loading the table, and still get the great benefits of minimal logging that SELECT INTO provides, if you code the INSERT as required, i.e., you use a TABLOCK hint on the destination table:
INSERT INTO table_name ( ... ) WITH (TABLOCK) SELECT ... FROM ...
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply