September 15, 2016 at 2:21 pm
The thing I've realized from this, which should have been obvious, is that when you loop through batches using TOP, you're always doing a table scan from the beginning and going over rows that you have already examined previously. If you have a huge table and a comparatively low TOP setting you get hundreds of scans.
With the range technique, you walk through the table once. No wonder it's so much faster.
:w00t:
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 15, 2016 at 8:44 pm
September 16, 2016 at 1:34 am
Jeff Moden (9/13/2016)
ChrisM@Work (9/13/2016)
Eric M Russell (9/12/2016)
4 million rows is actually not a lot, but it depends on the speed of your storage system and whether the table will need to remain online and usable while the deletion is in process. Essentially, you want to delete the rows in batches to reduce the size of active transaction logging. The following example deletes from a table in 100,000 row batches and waits for 1 minute between each delete operation. This wait window gives the storage system additional time to keep up and prevents extended periods of blocking by allowing other user processes an opportunity to start and complete. When no more rows to be deleted, no rows were affected by the last delete operation, it breaks from the loop. The number of rows per batch and wait time between batches can be adjusted as needed.WHILE 1 = 1
BEGIN;
DELETE TOP(100000) FROM MyTable WHERE <delete criteria>;
IF @@ROWCOUNT = 0 BREAK;
WAITFOR DELAY '00:01:00';
END;
Also, it helps a lot if the the delete criteria is indexed.
Batch deletes like this can be crippled by physical reads. You've probably set the batch size based on performance against cached data. The first one or two batches might well have the data to be deleted in cache and will be suitably fast. After that, the deletes can be dragged to the ground by physical reads - the rate limiting step of the process isn't how fast rows can be deleted, but how fast they can be read from disk. Cache priming can make a very significant difference to batch delete times. Uncommitted reads of the data to be deleted (including all indexes, not just the CI) and blackholed into variables take next to no time. We have one observation of a batched delete which would have taken six weeks cut down to a mere few hours using this method.
You can also modify that code to skip the most expensive iteration of them all and that's the last iteration where nothing is found but it still had to do a full scan looking. If you're trying to delete in batches of 100K rows (for example), then any deletion that returns less than 100K (the batch size) is actually the last iteration.
For the given code above, the change to skip that last iteration is pretty simple...
WHILE 1 = 1
BEGIN;
DELETE TOP(100000) FROM MyTable WHERE <delete criteria>;
IF @@ROWCOUNT [font="Arial Black"]< 100000 [/font]BREAK;
WAITFOR DELAY '00:01:00';
END;
It won't help with the problem that you've identified, though.
Simple stuff like this is soooo often missed. Thanks for the reminder, Jeff.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 21, 2016 at 9:03 am
This is how I delete many millions row a bite at a time. it seem works very well. however you will have see what 'number' your server can handle and what is running. during the day I usually take down 200,000 - 400,000 at time depending on table. at night when the mice out to play the number goes to 500,000 to 1,000,000. it is very fast.
Now this might not work for people with no tempdb/log disk space. Be aware that tempdb, and log size changes when the number goes up.
So when you picking your number for your server be aware of diskspace (temp) and what is running.
I started with 100,000 and went up.
DECLARE @RowsDeleted INTEGER
SET @RowsDeleted = 1
WHILE (@RowsDeleted > 0)
BEGIN
BEGIN TRANSACTION
DELETE TOP (500000) FROM DATABASE.dbo.TABLENAME
WHERE AcctNbr NOT IN (SELECT xxxx FROM database.dbo.tablename with (nolock))
SET @RowsDeleted = @@ROWCOUNT
COMMIT TRANSACTION
END
Note: I didn't read the previous post. Either way this work well for us. the sub select is pulling from a clustered index if that helps. I will probably add the wait command to this after reading everything. the first batch or two do seem to run 'slower' but after that it deletes at least ten better that a simple delete (which I would never do on a large scale - scar tissue you know π ). Have a great day.
September 22, 2016 at 2:45 pm
Withdrawn. I'm being critical. Is it Friday yet?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 22, 2016 at 3:24 pm
The Dixie Flatline (9/22/2016)
Withdrawn. I'm being critical. Is it Friday yet?
YES!! It is Friday!! Unfortunately, it's not Friday here, yet. :Whistling:
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 6 posts - 31 through 35 (of 35 total)
You must be logged in to reply to this topic. Login to reply