(Quickly) Delete records from a table, with condition

  • Hello all,

    I'm supporting a database that I inherited so now having to handle some issues that are now arising.  In particular, I have a stored proc which just started giving me fits. This proc builds a temp table with a bunch of records within a date range, then deletes records within that same date range from a physical table (Table A), then inserts the records from the temp table into that physical table -- essentially providing new updated data in that physical table.

    The sp is now hanging on the DELETE FROM TABLE A WHERE DATE BETWEEN X AND X.

    I don't know if there's a better way to delete and insert a large number of records but was reading something about a variation, using a derived table like so:

    delete t1

    from (select top (10000) *

            from t1 order by a) t1

    I was going to rearrange my DELETE statement to something like:


    DELETE TABLE A
    FROM
       (SELECT *
        FROM TABLE A
       WHERE DATE BETWEEN X AND X)

    But the syntax looked a little odd to me.  Mind you, there's no primary key on TABLE A so I can't do a DELETE FROM TABLE A WHERE ID IN (SELECT ID FROM #TEMPTABLE)

    Any thoughts out there on the best way to handle this?

    (BTW, after the delete is a pretty typical insert statement:  INSERT TABLE A (col1, col2...) SELECT X, Y, Z FROM #TEMPTABLE)

    Any help would be greatly appreciated!
    Thanks

  • Tacy,

    Assuming you are not in a position to add a PK I would suggest adding an Index on your Date Field and depending how "wide" you data is to cover the rest of the table in the include.

    Regards,
    Matt

  • Thanks, Matt.  But I'm not sure I'm following.  There is an index on the date column. (Non Unique, Non Clustered).  I'm looking to delete everything in the physical table in that date range. 
    What exactly are you suggesting?

  • Can you give us an idea how many rows are in the table and how many, typically, are in one deletion batch?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • There are around 1.5 million rows in the physical table.  I'm trying to delete around 70K-100K rows on the delete statement.

  • Wild idea:
    Why don't you simply update the table instead of deleting and re-inserting?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • An update is not an option here. 

    Any other ideas?  I'm getting desperate....

  • tacy.highland - Friday, May 5, 2017 2:20 PM

    An update is not an option here. 

    Any other ideas?  I'm getting desperate....

    This will require some code changes, but have you considered adding a new BIT column to your table called IsDeleted (or whatever).
    Instead of deleting the rows in the proc, just set the IsDeleted flag to 1.
    Have a separate process running periodically to batch delete the rows where IsDeleted = 1.
    Any code accessing the table would have to be modified to exclude rows where IsDeleted = 1.
    This would speed up your proc, yet achieve the same end result.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • My original thought was that since there was no PK there where no indexes either.  Sorry that won't work.
    My only other thought is to do the opposite of what you are trying to do.  Copy the Good Data into a new/temp table, truncate the old table, copy the good data back into you table or similarly do it with a series of renaming tables.
    Hope this helps.

    Regards,
    Matt

  • I can't truncate the table because I'm only updating/reinserting some records, not all.  That's why I have to specify the date range in the delete statement.

  • Something like the following perhaps?

    USE tempdb;
    GO

    -- Create and populate a 1.5M row TestTable...
    IF OBJECT_ID('tempdb.dbo.TestData', 'U') IS NULL
    BEGIN
        CREATE TABLE dbo.TestData (
            ID INT NOT NULL PRIMARY KEY CLUSTERED,
            SomeDate DATE NOT NULL
            );

        WITH
            cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
            cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
            cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
            cte_Tally (n) AS (
                SELECT TOP 1500000
                    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                FROM
                    cte_n3 a CROSS JOIN cte_n3 b
                )
        INSERT dbo.TestData (ID, SomeDate)
        SELECT
            ID = t.n,
            SomeDate = DATEADD(dd, CHECKSUM(NEWID()) % 5000, '2010-01-01')
        FROM
            cte_Tally t;

        -- Add a nonclustered index to the TestData table to speed up the following delete operation...
        CREATE NONCLUSTERED INDEX ix_TestData_SomeDate ON dbo.TestData (SomeDate);
    END;

    --=====================================================================

    -- Create a temp table to hold the deleted rows...
    IF OBJECT_ID('tempdb..#Deleted', 'U') IS NOT NULL
    DROP TABLE #Deleted;

    CREATE TABLE #Deleted (
        ID INT,
        SomeDate DATE
        );

    -- Delete the desired range of rows and use the OUTPUT clause to add the deleted rows to the temp table...
    DELETE td
        OUTPUT Deleted.*
        INTO #Deleted
    FROM
        dbo.TestData td
    WHERE
        td.SomeDate >= '2010-01-01'
        AND td.SomeDate < '2012-01-01';

    --=====================================================================
    -- >>> Deleted rows are now in #Deleted and able to be "updated" as desired... <<<
    --=====================================================================

    -- Reinsert the previously deleted (and subsiquently updated) rows back to the original table...
    INSERT dbo.TestData (ID, SomeDate)
    SELECT
        d.ID,
        d.SomeDate
    FROM
        #Deleted d;

  • tacy.highland - Friday, May 5, 2017 2:20 PM

    An update is not an option here. 

    Any other ideas?  I'm getting desperate....

    Why is UPDATE not an option?
    Your table doesn't have a PK, does it have a clustered index?
    Is there an index on the date column?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • tacy.highland - Friday, May 5, 2017 12:17 PM

    The sp is now hanging on the DELETE FROM TABLE A WHERE DATE BETWEEN X AND X.

    If it was working fine before and is now suddenly "hanging", that's a pretty good indication of two things... 1) parameter sniffing and 2) statistics that haven't been rebuilt in a while.

    The first can be fixed by adding an option to recompile and should probably be done at the statement level rather than the SP level.  Even that won't help, though, if statistics are out of date.  The fix for that, of course, is to rebuild the statistics.  If the SP is used on a regular basis to delete and the insert a decent number of rows on a  regular basis, the number of rows may still not be big enough to trigger an auto-update of stats.  If you figure it out and do a little planning, you might want to update either index or column stats at the beginning of the proc every time it runs only for the stats that the SP uses and then rebuilt all stats that have a modified rowcount for the whole table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Saturday, May 6, 2017 8:59 PM

    tacy.highland - Friday, May 5, 2017 12:17 PM

    The sp is now hanging on the DELETE FROM TABLE A WHERE DATE BETWEEN X AND X.

    If it was working fine before and is now suddenly "hanging", that's a pretty good indication of two things... 1) parameter sniffing and 2) statistics that haven't been rebuilt in a while.

    The first can be fixed by adding an option to recompile and should probably be done at the statement level rather than the SP level.  Even that won't help, though, if statistics are out of date.  The fix for that, of course, is to rebuild the statistics.  If the SP is used on a regular basis to delete and the insert a decent number of rows on a  regular basis, the number of rows may still not be big enough to trigger an auto-update of stats.  If you figure it out and do a little planning, you might want to update either index or column stats at the beginning of the proc every time it runs only for the stats that the SP uses and then rebuilt all stats that have a modified rowcount for the whole table.

    Seems most likely to me. Less likely is that the table is a heap (we don't know yet) and repeated delete/insert cycles have increased the size of the table somewhat. There's a good article covering this here.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Are there other indexes on the table, apart of the non clustered on Date?

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply