Deleting and locking...

  • I have been asked to help remove a lot of records from a HUGE table. The target table has over billions of rows and consumes about 800GB. I've been asked to remove records older than a certain date. This huge table, doesnt include a date column. Another, also huge table does and the tables have corresponiding record ids... so basically, I'm doing:

    delete from hugetable where recid in

    (select recid from otherhugetable where time < 'date')

    Whether I do this as the nested query above or as a join on recid doenst seem to matter... the query locks the entire huge table.

    I've aslo tried

    delete from hugetable with (rowlock) where recid

    in (select recid from otherhugetable with (nolock) where time <'date')

    same problem...

    any tips or thoughts would be appreciated

  • I'm not sure if it would help much in this scenario, but you could eliminate some of the bottleneck on the subquery by populating a temp table:

    SELECT recid

    INTO #tmpDeleteRecId

    FROM otherhugetable where time < 'date'

    delete from hugetable where recid in

    (select recid from #tmpDeleteRecId)

    DROP TABLE #tmpDeleteRecId

    And a couple of questions...

    Do you have a foreign key relation between hugetable and otherhugetable? What is the total count of records that needs to be deleted, compared to total records in hugetable?

  • Or something like the following, which may be faster depending on the amount of data you are deleting:

    --this saves the data you want to retain

    select * into NEWhugetable from hugetable h

    inner join otherhugetable o on h.recid = o.recid

    where o.time >= 'date'

    script indexes/constraints from hugetable (right-click the table via EM and generate scripts)

    drop table hugetable

    rename NEWhugetable to hugetable -- via EM

    recreate the indexes/constraints created via step 2

    OR, be sure you have an index on otherhugetable based on recid, date - script the indexes from hugetable, drop the indexes on hugetable, run your delete script and then recreate your indexes on hugetable. Placing the database into simple recovery mode before doing the delete will help as well. Be sure to have a full backup prior to the delete. Once the delete is complete, place database into full recovery again, do another full backup and follow with your normal log backups.

    -- You can't be late until you show up.

  • Unfortunately, the reason I have been asked to do this work is that the server was allowed to get to less than 3% available disk space... which means temp tables and reindexing is not an option at this point. I think I will have to suffer through some painful deletes until enough space is left to index date and recid.

    Thanks for the tips.

    W.

  • To answer your questions, there is a FK relationship on recid and I'm deleting about 15%- 300 million records out of about 2 billion.

  • in that case, a couple of options could be:

    1. If NULL references are not allowed, meaning you must have 1:M corresponding records in otherhugetable for every record in hugetable, you could first run:

    DELETE FROM otherhugetable where date > 'date'

    and then run:

    DELETE FROM hugetable where recid not in(SELECT DISTINCT recid from otherhugetable)

    2. Using the option provided by tosscrosby, you could export your data to a linked server/excel file off the local server.

  • Do you have the option to temporarily map an external drive and set the tempDB to use that drive during this process?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply