March 5, 2013 at 9:26 pm
Comments posted to this topic are about the item Hidden Tricks To SQL Server Table Cleanup
March 6, 2013 at 12:31 am
I think the creation of a separate view is unnecessary when you can have the view inline:
DECLARE @Rowcount int
SET @Rowcount = -1 -- Initialise
WHILE @Rowcount <> 0
BEGIN
;WITH CTE AS
(
SELECT TOP(2000) *
FROM EventTracking.dbo.Event
WHERE EventCloseDate < GETDATE() - 6 -- remove rows older than 6 days
ORDER BY Eventid
)
DELETE CTE
SET @RowCount = @@ROWCOUNT
END
Alternatively:
DECLARE @Rowcount int
SET @Rowcount = -1 -- Initialise
WHILE @Rowcount <> 0
BEGIN
DELETE EventTracking.dbo.Event
WHERE EventId IN (SELECT TOP(2000) EventId
FROM EventTracking.dbo.Event
WHERE EventCloseDate < GETDATE() - 6 -- remove rows older than 6 days
ORDER BY Eventid)
SET @RowCount = @@ROWCOUNT
END
March 6, 2013 at 1:52 am
Most people would never go beyond the bounds of a BIGINT with an IDENTITY(1,1) but for those of us who do, remember that reseeding identity values is incompatible with this method.
Mind you at those sorts of scales you probably have enterprise edition and can do partition switching to do even faster deletes with minimal IO impact!
I take it that the EventCloseDate as a NULL field means that you don't know in advance what that close date will be?
If you did know the EventCloseDate upfront then I'd look at putting the clustered key on it. Either that or put persist the EventStartDate and put the clustered key over the EventStartDate and Duration. Obviously this would need evaluating and testing as events going in out of sync will cause some degree of fragmentation. Whether such fragmentation would be significant or an issue for you is something only you can determine. All standard "It Depends" stuff.
March 6, 2013 at 3:42 am
Alternatively, we can use this too.. it takes about an hour to delete a million records
set rowcount 50000
DELETE FROM EventTracking.dbo.Event
while @@rowcount>0
begin
set rowcount 50000
DELETE FROM EventTracking.dbo.Event
END
set rowcount 0
______________________________________________________________Every Problem has a Solution; Every Solution has a Problem: 🙂
March 6, 2013 at 4:10 am
Agree with last poster. Basically, DELETE TOP 1000000 FROM EVENT doesn't work, but DELETE TOP (1000000) FROM EVENT does. Similarly ROWCOUNT works, but the former (TOP), has the benefit that you can use a variable. So one of the statements is very misleading, except if you're using SQL Server 2000 - when DELETE TOP (1000000) won't work.
Although this tip is useful for me, as I have groups of tables I have to delete in chunks and order of last ID in the same sequeunce across the sets of tables.
This DBA says - "It depends".
March 6, 2013 at 4:19 am
The code in the article doesn't check that the EventCloseDate older than 6 days when it does the delete. Is it that the EventCloseDate is always present and monotonically increasing with the EventId?
It also won't always delete all the rows, for example, if there are 5000 rows that need deleting it will only delete the first 4000.
March 6, 2013 at 5:01 am
Itzik Ben-Gan teaches
WHILE 1=1
BEGIN
DELETE TOP (5000) FROM table WHERE .....
IF @@ROWCOUNT < 5000 BREAK
END
and you can add a WAITFOR to delay between iterations
Adjust the 5000 to suit your table.
March 6, 2013 at 6:02 am
Why keep counts at all?
Let all logic live in the view, then while any entry in the view, remove
ALTER PROCEDURE [dbo].[usp_VTrickleDeleteEvent]
AS
SET NOCOUNT ON
SET QUOTED_IDENTIFIER OFF
BEGIN
WHILE exists (Select 1 from VEventCleanup)
BEGIN
BEGIN TRAN RemoveEvent
DELETE FROM VEventCleanup –- delete from view with order by
COMMIT TRAN -- COMMIT to release locks
END
END
March 6, 2013 at 6:09 am
What about the transaction log? I have worked in an environment where disk space was hard to come by. I had a similar need, but I also had to add CHECKPOINTs in the code, because the tran log was filling up the disk space.
You might be asking, "Why not just add more disk?" You would have to understand the company I worked for. BTW, I don't work there anymore :).
March 6, 2013 at 6:46 am
This is excellent. We also use the DELETE TOP (###) syntax. Works great.
BTW, it is 'stored procedure' not 'store procedure.' Sorry for picking the nit.
March 6, 2013 at 7:27 am
Adding a view would have helped my little process deleting polution from an audit trigger. A view would have simplied my efforts!
My twist for others to consider is adding a pause to let other possibly locked out processes to jump in:
WHILE Exists(SELECT TOP 1 RECORDID
FROM [Audit].[dbo].SCHOOL
where ChangeBy IN ('<snip>'
,'NT AUTHORITY\SYSTEM')
) BEGIN
DELETE [Audit].[dbo].SCHOOL
FROM [Audit].[dbo].SCHOOL x
INNER JOIN (
Select TOP 10000 RECORDID
FROM [Audit].[dbo].SCHOOL
where ChangeBy IN ('<snip>'
,'NT AUTHORITY\SYSTEM')
) y
ON x.RECORDID = y.RECORDID
waitfor delay '00:00:01'
END
March 6, 2013 at 7:51 am
Here is one technique I use when there is a need to perform bulk deletes on a large table. It is not an online operation, but it should be able to complete the job on a multi-GB table within 10 minutes. Not only is this one of the fastest methods, but it minimizes tempdb usage, minimizes transaction logging, and the end result leaves the table with organized indexes.
#1 BCP a select of the records you to keep into native format file ( -n option).
#2 Truncate the table.
#3 Drop indexes.
#3 BCP from the file into the table. Specify -E option if you want to insert original identity values.
#4 Re-create indexes.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
March 6, 2013 at 7:54 am
OK, I am not sure I am on board as the example is written. If you know you are going to be cleaning up the database regularly I would personally cluster the EventClosedDate field to ensure I can use that for fast removal and non-cluster the primary key (identity column). This is for multiple reasons.
1) I don't see anything at all saying EventID 2 couldn't end months after EventIDs 3-2000. And since your top doesn't actually take that into account you could in theory delete something still valid in your rules.
2) If your purpose is to remove values from a table where a particular is your target goal then that column as a rule must be referenced in your query otherwise you may miss values or get more values than expected.
3) You are using ceiling in your query, thus if the calculation is 880.000000001 you will make 881 loops. In the 881st loop you will wipe out data which does have a date value >= your 6 day old value becuase there is no sanity check. I suggest use floor, you will have some values hanging but better to trim to little than to trim to much in the business world. Refer back to 1.
4) I would use your method after altering indexes to ensure I can use EntryDate as my qualifier to trim exactly what I need.
BTW, title is a bit misleading, breaking processes into smaller transactions has always been known to have the best performance method in reducing contention.
March 6, 2013 at 7:55 am
karthik babu (3/6/2013)
Alternatively, we can use this too.. it takes about an hour to delete a million recordsset rowcount 50000
DELETE FROM EventTracking.dbo.Event
while @@rowcount>0
begin
set rowcount 50000
DELETE FROM EventTracking.dbo.Event
END
set rowcount 0
Although I'm fond of that technique as well, it's deprecated syntax: http://msdn.microsoft.com/en-us/library/ms143729.aspx.
- Jeff
March 6, 2013 at 7:55 am
My concern about the application of this technique is a reliance on the primary key being date specific.
In short, it selects the top N rows where the date < 6 days ago ordered by primary key.
As soon as a row is added with a date >= 6 days ago, this query will ignore any records following that date which may have been previously deleted.
You can still use the batch method, which is really the key to the whole performance issue, and very timely. The difference is that you need to step through the whole table.
Viewing 15 posts - 1 through 15 (of 95 total)
You must be logged in to reply to this topic. Login to reply