Delete data: Offline or Online?

  • SQL Server 2000

    I have a database with 9 months worth of data and am trying to get it down to 6 months worth of data. The database was created by a third-party application and I have to use a 'purge' command they supply. I'm trying to find the quickest way to delete a set amount of data (not all of it).

    Currently, I am deleting data while the database is online and in use. It takes about 3 hours to delete around 1 million rows.

    I am unable to test doing the delete with the database offline and we don't have a test system. What I would like to know is: has anyone done a comparison of the difference in time when deleting from an offline database vice an online database?

    -SQLBill

  • Might it be easier to insert the records not to be deleted into a copy table (could use nolock if no one is updating table) with same structure and when the insert finishes drop the old table and rename the new table with the old table name?

  • Nope. There are several interconnected tables and the 3rd-party app deletes from several of the tables and 'cleans' up data in other tables based on what it deletes from the primary table.

    Bottom line is that I need to use their purge command to delete the data. I just don't have a test bed to find out which is quicker - to delete the data while the database has users connected or whether having it in single-user mode would be quicker. I'm positive it would be quicker to do the delete with only me connected, but how much faster? I need to be able to tell my management a questimate of how much time I need to have the database offline.

    -SQLBill

  • Really dificult to estimate without a test base, what is certain is that with no users connected the only locks created will be from the delete statement.

    Could you not batch delete instead of doing a big delete?

    CREATE TABLE [#ProccessTable] (

     [id] [int] not null,

     Processed [bit] NOT NULL CONSTRAINT [ProccessTable_Processed] DEFAULT (0)

                 )

    CREATE UNIQUE CLUSTERED INDEX ID_ind

       ON #ProccessTable (ID)

    insert into #ProccessTable(id)

    SELECT

    id

    FROM yourtable (nolock)

    your where clause

    while (select Count(*) from #ProccessTable where processed = 0) > 0

    begin

    delete yourtable

    where id in (select top 1000 id from #ProccessTable where processed = 0)

    update #ProccessTable set processed = 1

    where id in (select top 1000 id from #ProccessTable where processed = 0)

    end

    drop table #ProccessTable

    This is assuming that there is a unique identifier on the table where the records need deleting, it may still take a long time but if you gage the batch size right the deletes will occur and locking will be minimul and you transaction logs should not bloat. The result should be that the deletes occur and the end users are not affected.

  • That is basically what the 3rd party purge does. It breaks it down to 100 rows at a time. My question isn't about how to do the delete....I have to use their command. My question is: Is it faster to do the delete in single-user mode and if so how much faster?

    -SQLBill

  • didn't twig the 3rd party purge batched, unfortunatley without a test bed its pretty difficult to estimate the time difference.

  • It seems obvious that it would be faster in single-user mode, but I don't know how you would estimate the time difference with no test system.

    Have you tried profiling the system during deletes?  Is there a lot of blocking?  If there is, it would be a strong argument for having a periodic maintenance window for single-user deletes.

    Doing small deletes more frequently would probably have less impact on the system, if their purge command is something you can schedule.

  • (quote)It seems obvious that it would be faster in single-user mode, but I don't know how you would estimate the time difference with no test system.(end quote)

    This is why I'm asking, my hope is that someone with a test system may have already tested this.

    I already schedule the purge, but we are a 24x7 shop so it affects my users regardless of the time it is scheduled. Also, I currently get in more data daily than I can delete.

    -SQLBill

  • Maybe I should rephrase my statement.  I don't see how someone with no knowledge of your hardware, application, or server load could possibly make an informed statement about the speed difference.  Regardless of the testing they may have done on their own systems.

    I can say that using profiler to show that the traffic from other connections is frequently blocking the deletes from the purge connection would be sufficient proof for me to tell management that I needed a single-user maintenance window.  If there is no obvious blocking, you might try to collect statistics on read/write counts for the purge connection vs total read/write counts to estimate what percentage of the server's perfomance is used for the purge.

    Can you get a one-time maintenance window to run a single-user purge as an experiment?  This may be the only way to get an accurate answer.  Does the application vendor have any useful advice, or don't any of their other customers have this problem?

    If the powers that be expect the system to run 24x7 with no maintenance window, and you're collecting more data every day than you can delete, I think we can all agree that a disaster is looming.  Nobody believes you when you say the canoe is headed for a waterfall, but it will be your fault when it goes over.  Maybe you should estimate the annual data growth and submit a purchase order for a SAN system to hold 10 years of data to get their attention.

  • As everything with management, you show them there is a problem and what the best solution is and they say we'll get around to that. We are looking at filtering the data we get, which would make it more managable but they've been talking about it for 6 months now and nothing has been approved.

    I'm not going to get approved downtime, unless I can estimate how long I'll be down.

    I was hoping that someone had benchmarked this type of thing and could say "for me, it was 4 times faster in single-user mode than with users connected". Then I could tell the powers-that-be "I need a one hour test downtime and I estimate I will be able to remove one days worth of data. The downtime will let me confirm that." But right now, without an estimate, deleting one day would have us down for 4 hours and that would be unacceptable.

    -SQLBill

  • All I can say is what Scott already warn you about. You are the only one who can determine how much is going on on your system. You are the only one that knows what is your hardware. You are the only one that knows about the DB structure and which tables are being deleted.

    That said, if you have plenty of resources (hardware) and the tables that you are deleting from are 'history' type of table you will probably won't feel the difference but if the tables you are deleting from are 'online' type then single user will give you an edge if there are a lot of users accessing it.

    24x7 is not supposed to be 24x7 literally you need maintenace windows regardless

    Cheers,

     


    * Noel

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply