A big Delete

  • I know you guys will have some great ideas for this:

    I've got a table with just over 100,000,000 rows. Each row can be as wide as 158 bytes. I need to delete around 70,000,000 of these rows and I'm seeking the fastest solution.

    Some things to keep in mind:

    Concurrency is an issue. Off peak SELECTS number 3-5/min, UPDATES 5-8/min INSERTS 0. For this reason, we have to keep blocking to an absolute minimum. If we decide to SELECT the 30M out of the table, rather than DELETE the 70M, we can SELECT with(NOLOCK) and clean up the drift later.

    The indexes are based on typical access patters. There is a clustered index on Cust_num, Order_Id. Order_ID is a NonClustered PK. There are two columns that we can use to determine which rows we delete/keep. OrderNum is a VARCHAR(55) and the first 8 bytes are the date the record was created, there is also a datestamp.

    Here is the basic table structure:

    CREATE TABLE [dbo].[orders] (

           [order_id] [varchar] (55COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

           [cust_num] [varchar] (16COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

           [creation_date] [datetime] NULL 

            --.... 

           CONSTRAINT [PKN__orders__order_id] PRIMARY KEY  NONCLUSTERED 

           (

                   [order_id]

           )  ON [INDEXES] ,

           

    ON [PRIMARY]

    GO

     CREATE  UNIQUE  CLUSTERED  INDEX [IXC__orders__cust_num] ON [dbo].[orders]([cust_num]ON [PRIMARY]

    GO

    An order_id looks like this:

    20060719121929-9af44fee-33bd-48f3-b286-efd23f8aaa77

    I'm very interested to hear your response.

    Thanks!

    Robert Cary

    SQL guy and Houston Magician

  • DECLARE @rc int

    SET @rc = 1

    SET ROWCOUNT 500

    WHILE @rc > 0

    BEGIN

    DELETE FROM [dbo].[orders]

    WHERE <your criteria>

    SET @rc = @@ROWCOUNT

    WAITFOR DELAY '00:00:02.000' -- to let other processes do their job

    END

    SET ROWCOUNT 0

    _____________
    Code for TallyGenerator

  • Robert,

    Shifting gears a bit, how's the performance of inserts of new records?  How about for a returning customer with a new order?  Your clusterd index is VERY contrary to the Primary Key and doesn't seem to have all that you indicated... according to what you posted, a customer may only have 1 entry in the table because the unique key is on the Cust_num and not the combination of columns you indicated in your write up.  Of course, maybe you assign a new Cust_Num even for returning customers (in which case, you may have an even bigger problem with the design).  This is from your first posting...

    CREATE  UNIQUE  CLUSTERED  INDEX [IXC__orders__cust_num] ON [dbo].[orders]([cust_num]) ON [PRIMARY]

    Also, Serqiy's code works great for something like this but you need to be aware of the simple math here... if you are trying to delete 70,000,000 million rows 500 at a time with a 2 second delay, it will take about 19.44 hours to complete... don't stop the job just because it seems to be taking a long time...

    Also, I'd have to say things have gotten pretty much out of hand in that table.  Once you get it cleaned up, you might want to consider running Serqiy's code as a scheduled job at least once a day.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Have you thaught about archiving this data on another server... I doubt that this type of infromation is to be deleted... ever!

  • It looks like I made a mistake in the Schema def. The Clustered Unique key is actually on Cust_num, Order_Id.

    Almost all selects will join cust_num with another table also clustered on cust_num.

    It's also worth mentioning that the Orders table does not contain orders in the conventional sense of the word, For this reason, we only need to keep three months worth of data.

    I like Sergiy's approach and have something similar to that in place as an ongoing nightly process. But I'm working on the big cleanup right now.

    The access is typically returning all (or the top 100) orders for a customer. Occasional we look up a single order by order_num. Again, the table is clustered on

    cust_num, order_num

    SQL guy and Houston Magician

  • It's worth pointing out that I have about a 5 hour window to run this so it may need to span multiple days. One approach I had considered would be to select out the 30,000,000 into a new table, switch the tables, and clean up any drift. I fear, though, an insert from a single transaction would be huge (Several GB) and multiple Inserts would require multiple scans.

    Looking at the index, my colleague has suggested making a table of rows to delete based on the clustered index (cust_num, order_num) That would be a much smaller transaction and would enable index seeks on the rows to delete) This may be the best option, but I am very interested to hear what the SQL server intelligentsia have to add.

    Thanks so much!

    SQL guy and Houston Magician

  • If you run Serqiy's method, you can let it "crawl" all day... it will not interfere with normal daily processes...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Select ... into <new table> from <old table> where <rows to keep>

    drop <old table>

    sp_rename <new table>,<old table>

    barring any "constraints"



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply