April 23, 2012 at 8:15 am
Hi folks,
I have a daily process that deletes all but the last 28 days of data
The table holds on average 387,758,627 records at any one time and (obviously adds records daily) So to maintain performance we delete anything older than the 28th day.
This process runs for about 12 minutes at 2am each morning, but during those 12 minutes we often get time-outs from other processes that are trying to use other tables in the database.
We are running on 32 cores of CPU with 128GB RAM and many TB of disk but I wonder if my SQL delete is not written in the best way:
###########################################################
--update liv_css_dp_counts
-- in order to keep 28 days of data we need to delete the data for one 28 days + 1 day
-- This must run and complete before 12:00 every day
declare @today as int
declare @tomorrow as int
select @today = DATEPART(day,getdate())
select @tomorrow = DATEPART(day,dateadd(day,1,GETDATE()))
declare @loopvar as int
select @loopvar= (@today+1) % 32
while @loopvar % 32 <> (@tomorrow+1) %32
begin
delete from liv_css_dp_counts where liv_css_dp_counts.day=@loopvar %32
select @loopvar=@loopvar+1
end
####################################################################
The table liv_css_dp_counts is partitioned on the day column so in theory is spread over 28 tables
There is only one key on the table which is *not* the day field.
Any ideas anyone ?
Kind Regards
Steve
April 23, 2012 at 8:41 am
Yes that type of delete is going to perform horribly, especially with that much data. There is no need for a loop here at all. This should be a single delete statement. Without ddl it is hard to say what it should look like. You have a column named day? Do you have a column with a datetime datatype?
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 23, 2012 at 8:53 am
Thanks for replying.
The table is:
[id] [bigint] IDENTITY(1,1) NOT NULL,
[PostCode] [varchar](8) NULL,
[DeliveryPoint] [varchar](2) NULL,
[Quantity] [int] NULL,
[css_session_id] [int] NOT NULL,
[day] [int] NULL,
[month] [int] NULL
The only key is the css_session_id which is used to get the data from this table and link it to the parent records.
The date is on the parent record - 1 parent record has around 5-8k child records in this table.
But if I use the parent record to control the delete it takes 10 times the amount of time to complete due to the overhead of managing the relationship (I assume)
April 23, 2012 at 8:54 am
sorry should have added that the loop is to catch those months where there are more than 28 days
April 23, 2012 at 9:06 am
Why have day and month as columns with a datatype of int? Can you replace those with a datetime column, which could be the same value as the parent table? Then your delete is trivial. I think you are suffering the wrath of not using proper datatypes.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 23, 2012 at 9:08 am
Another idea would be to use cascading deletes. You said that is slower? My guess is due to indexes.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 23, 2012 at 9:23 am
OK, I can add a date into the row, and I will see how that goes.
I figured that looking at an INT when deleting would be more efficient than working with a function on a date.
But I really am quite new to SQL so bow to your greater expertise.
It will probably take a couple of days to get the change through but I'll post back the results.
Steve
April 23, 2012 at 9:29 am
JokerUK (4/23/2012)
OK, I can add a date into the row, and I will see how that goes.I figured that looking at an INT when deleting would be more efficient than working with a function on a date.
But I really am quite new to SQL so bow to your greater expertise.
It will probably take a couple of days to get the change through but I'll post back the results.
Steve
LOL no bowing around here...and certainly not to me.
If you can add the same datetime as the parent then your delete becomes a single delete.
delete from liv_css_dp_counts
where NewDateTimeCol < DATEADD(d, -28, getdate())
No need for loops at all. Give it a shot and see if that doesn't make things a lot easier to code and quite a bit faster than looping.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 23, 2012 at 10:34 am
just my 2ct.
If your single transaction delete takes to long, a loop delete may be appropriate for your case.
declare @tranrows int
declare @rwcount int
set @tranrows = 50000
while 0=0
begin
begin tran
delete top ( @tranrows )
from liv_css_dp_counts
where NewDateTimeCol < DATEADD(d, -28, getdate())
set @rwcount = @@rowcount
commit tran
if @rwcount < @tranrows BREAK
end
This way, you'll have way more smaller transactions, but less blocking to ongoing operation.
Test it, Test IT
Make sure your filtering column is indexed !
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
April 24, 2012 at 12:59 am
OK,
I added a datefield and retro-populated exisiting records with the date from the parent record.
I also modified the data-loading app so that new records are populated with the date.
The good news is that this mornings delete did not cause any timeouts - I will need to monitor for a few days as the timeouts did not happen every day.
Interestingly through the delete process has taken slightly longer - it went from 12 mins yesterday to 14 mins today.
I will look to add the reduced transaction /smaller batch delete over the next day if the problem does not go away.
Cheers all
Steve
April 24, 2012 at 4:32 am
Now that you have a batched process, the right way to go, you need to worry about how the optimizer is handling it. Do you have an index on the column that is filtering for the delete statement? What kind of foreign key constraints are on this table? All those foreign key constraints will be checked as part of the delete, so you might need indexes there as well. If you want advise in that area, post the execution plan.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 24, 2012 at 6:59 am
Thanks for that response,
I think that I've attached the execution plan
Kind Regards
Steve
April 24, 2012 at 7:08 am
Based on that plan you're still doing a loop.
But, based on the plan, you're getting a table scan for each delete. The predicate is here: [MIS].[dbo].[LIV_CSS_DP_Counts].[day]=[@todelete]
That suggests that an index on the [day] column might be helpful. Also, unless I'm misreading the plan, this is a heap table? No clustered index? Unless you have a good reason, I'd suggest adding a clustered index to the table. Most tables, most of the time, benefit from a clustered index. It's just an improved storage mechanism within SQL Server.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 24, 2012 at 7:13 am
Grant Fritchey (4/24/2012)
...Also, unless I'm misreading the plan, this is a heap table? No clustered index? Unless you have a good reason, I'd suggest adding a clustered index to the table. Most tables, most of the time, benefit from a clustered index. It's just an improved storage mechanism within SQL Server.
Delete from heap table doesn't reclaim any space. You better to add a clustered index to this table as suggested by Grant.
April 24, 2012 at 7:52 am
Thanks folks,
Just to be clear, the delete is not about reclaiming space, it's about removing data to keep a manageable amount online for reporting The table has about 330M rows for 28 days data and the deletes average about 6-10 million rows per day.
The Table is partitioned on the 'day' field which is an integer rather than date.
I can add a clustered index to the table - that is no problem, the net aim is to get rid of the time-outs for other transactions on the server when the delete process is executing.
I really do appreciate your help and time on this.
Steve
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply