Record Delete is slowing server

  • Hi folks,

    I have a daily process that deletes all but the last 28 days of data

    The table holds on average 387,758,627 records at any one time and (obviously adds records daily) So to maintain performance we delete anything older than the 28th day.

    This process runs for about 12 minutes at 2am each morning, but during those 12 minutes we often get time-outs from other processes that are trying to use other tables in the database.

    We are running on 32 cores of CPU with 128GB RAM and many TB of disk but I wonder if my SQL delete is not written in the best way:

    ###########################################################

    --update liv_css_dp_counts

    -- in order to keep 28 days of data we need to delete the data for one 28 days + 1 day

    -- This must run and complete before 12:00 every day

    declare @today as int

    declare @tomorrow as int

    select @today = DATEPART(day,getdate())

    select @tomorrow = DATEPART(day,dateadd(day,1,GETDATE()))

    declare @loopvar as int

    select @loopvar= (@today+1) % 32

    while @loopvar % 32 <> (@tomorrow+1) %32

    begin

    delete from liv_css_dp_counts where liv_css_dp_counts.day=@loopvar %32

    select @loopvar=@loopvar+1

    end

    ####################################################################

    The table liv_css_dp_counts is partitioned on the day column so in theory is spread over 28 tables

    There is only one key on the table which is *not* the day field.

    Any ideas anyone ?

    Kind Regards

    Steve

  • Yes that type of delete is going to perform horribly, especially with that much data. There is no need for a loop here at all. This should be a single delete statement. Without ddl it is hard to say what it should look like. You have a column named day? Do you have a column with a datetime datatype?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks for replying.

    The table is:

    [id] [bigint] IDENTITY(1,1) NOT NULL,

    [PostCode] [varchar](8) NULL,

    [DeliveryPoint] [varchar](2) NULL,

    [Quantity] [int] NULL,

    [css_session_id] [int] NOT NULL,

    [day] [int] NULL,

    [month] [int] NULL

    The only key is the css_session_id which is used to get the data from this table and link it to the parent records.

    The date is on the parent record - 1 parent record has around 5-8k child records in this table.

    But if I use the parent record to control the delete it takes 10 times the amount of time to complete due to the overhead of managing the relationship (I assume)

  • sorry should have added that the loop is to catch those months where there are more than 28 days

  • Why have day and month as columns with a datatype of int? Can you replace those with a datetime column, which could be the same value as the parent table? Then your delete is trivial. I think you are suffering the wrath of not using proper datatypes.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Another idea would be to use cascading deletes. You said that is slower? My guess is due to indexes.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • OK, I can add a date into the row, and I will see how that goes.

    I figured that looking at an INT when deleting would be more efficient than working with a function on a date.

    But I really am quite new to SQL so bow to your greater expertise.

    It will probably take a couple of days to get the change through but I'll post back the results.

    Steve

  • JokerUK (4/23/2012)


    OK, I can add a date into the row, and I will see how that goes.

    I figured that looking at an INT when deleting would be more efficient than working with a function on a date.

    But I really am quite new to SQL so bow to your greater expertise.

    It will probably take a couple of days to get the change through but I'll post back the results.

    Steve

    LOL no bowing around here...and certainly not to me.

    If you can add the same datetime as the parent then your delete becomes a single delete.

    delete from liv_css_dp_counts

    where NewDateTimeCol < DATEADD(d, -28, getdate())

    No need for loops at all. Give it a shot and see if that doesn't make things a lot easier to code and quite a bit faster than looping.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • just my 2ct.

    If your single transaction delete takes to long, a loop delete may be appropriate for your case.

    declare @tranrows int

    declare @rwcount int

    set @tranrows = 50000

    while 0=0

    begin

    begin tran

    delete top ( @tranrows )

    from liv_css_dp_counts

    where NewDateTimeCol < DATEADD(d, -28, getdate())

    set @rwcount = @@rowcount

    commit tran

    if @rwcount < @tranrows BREAK

    end

    This way, you'll have way more smaller transactions, but less blocking to ongoing operation.

    Test it, Test IT

    Make sure your filtering column is indexed !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • OK,

    I added a datefield and retro-populated exisiting records with the date from the parent record.

    I also modified the data-loading app so that new records are populated with the date.

    The good news is that this mornings delete did not cause any timeouts - I will need to monitor for a few days as the timeouts did not happen every day.

    Interestingly through the delete process has taken slightly longer - it went from 12 mins yesterday to 14 mins today.

    I will look to add the reduced transaction /smaller batch delete over the next day if the problem does not go away.

    Cheers all

    Steve

  • Now that you have a batched process, the right way to go, you need to worry about how the optimizer is handling it. Do you have an index on the column that is filtering for the delete statement? What kind of foreign key constraints are on this table? All those foreign key constraints will be checked as part of the delete, so you might need indexes there as well. If you want advise in that area, post the execution plan.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for that response,

    I think that I've attached the execution plan

    Kind Regards

    Steve

  • Based on that plan you're still doing a loop.

    But, based on the plan, you're getting a table scan for each delete. The predicate is here: [MIS].[dbo].[LIV_CSS_DP_Counts].[day]=[@todelete]

    That suggests that an index on the [day] column might be helpful. Also, unless I'm misreading the plan, this is a heap table? No clustered index? Unless you have a good reason, I'd suggest adding a clustered index to the table. Most tables, most of the time, benefit from a clustered index. It's just an improved storage mechanism within SQL Server.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (4/24/2012)


    ...Also, unless I'm misreading the plan, this is a heap table? No clustered index? Unless you have a good reason, I'd suggest adding a clustered index to the table. Most tables, most of the time, benefit from a clustered index. It's just an improved storage mechanism within SQL Server.

    Delete from heap table doesn't reclaim any space. You better to add a clustered index to this table as suggested by Grant.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks folks,

    Just to be clear, the delete is not about reclaiming space, it's about removing data to keep a manageable amount online for reporting The table has about 330M rows for 28 days data and the deletes average about 6-10 million rows per day.

    The Table is partitioned on the 'day' field which is an integer rather than date.

    I can add a clustered index to the table - that is no problem, the net aim is to get rid of the time-outs for other transactions on the server when the delete process is executing.

    I really do appreciate your help and time on this.

    Steve

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply