Delete statement taking too much time..?

  • Hi,

    Table Name: CSV_Details - Total records is 12138186 row(s),

    As below delete script taking more than 55 min and not yet competed..

    Could you suggest if any alternative ways?

    Delete from CSV_Details where CSVID not in

    --select * from CSV_Details where CSVID not in

    (

    select CSVID from CSV_Details where Mark_Rev_No_fab in

    (

    select Mark_No from Transmittal_Planning_MarkNo where Mark_No not in

    (

    select distinct Mark_Rev_No_fab from CSV_Details

    where Mark_Rev_No_fab in

    (

    select distinct Mark_No from Transmittal_Planning_MarkNo

    )

    and DeleteFlag is null

    )

    ) and DeleteFlag is not null

    ) and DeleteFlag is not null and cast(CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)

    Thanks

  • First, is there any blocking going on? Second, have you looked at the execution plan to see what the pain points in the query are? If you have a table (or clustered index) scan on that large table, it's going to take a long time.

    John

  • how many records is actually being deleted?

    Good idea would be to do it in chunks.

  • Yes, or if this is part of an ETL process, consider copying the rows you want to keep to a new table, dropping the old table, and renaming the new one to the same name as the old. None of that will help, though, if the cause of the slowness is blocking, which is why that was my first question.

    John

  • Is there a trigger on CSV_Details that is fired when there are deletes? If so, that is most likely the culprit.

    Are there FKs to other tables that have ON DELETE CASCADE?

    -SQLBill

  • Part of the problem is that your query is needlessly complex. You reference CVS_Details three times and Transmittal_Planning_MarkNo twice and some of these are redundant. To understand this, I'll use set notation. I'll use C to refer to CVS_Details and T to refer to Transmittal_Planning_MarkNo. Your inner queries are essentially T - (C n T), which is exactly the same as T - C. Without details of your data, it's difficult to tell if there are similar issues with the rest of the query.

    Also, there is no need to use the DISTINCT keyword with IN/NOT IN subqueries.

    Finally, EXISTS may perform better than IN/NOT IN, and will also produce different results if any of subqueries return NULL values.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • You might try reworking/tuning your DELETE statement.

    Decent article on NOT IN alternatives: Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?

    YMMV but I'd also try identifying the values from your nested subqueries into a temp table and then JOINing to that for the delete.

    CREATE TABLE #DeleteMe (CSVID INT PRIMARY KEY CLUSTERED);

    INSERT INTO #DeleteMe (CSVID)

    select CSVID from CSV_Details where Mark_Rev_No_fab in

    (

    select Mark_No from Transmittal_Planning_MarkNo where Mark_No not in

    (

    select distinct Mark_Rev_No_fab from CSV_Details

    where Mark_Rev_No_fab in

    (

    select distinct Mark_No from Transmittal_Planning_MarkNo

    )

    and DeleteFlag is null

    )

    ) and DeleteFlag is not NULL

    DELETEcsv

    FROMCSV_Details csv

    LEFT OUTER JOIN #DeleteMe tmp ON csv.CSVID = tmp.CSVID

    WHEREtmp.CSVID IS NULL

    ANDcsv.DeleteFlag is not NULL

    ANDCAST(csv.CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • Don't disagree with other comments. But maybe I can give you a quick fix, and maybe not.

    First, you need an index on CSV_Details keyed first by Mark_Rev_No_fab that contains CSVID. If you don't have such an index, create one.

    Then try this:

    --Delete from CSV_Details where CSVID not in

    select * from CSV_Details where CSVID not in

    (

    select CSVID from CSV_Details where Mark_Rev_No_fab in

    (

    select Mark_No from Transmittal_Planning_MarkNo tpm

    where not exists(select 1 from CSV_Details cd where cd.Mark_Rev_No_fab = tpm.Mark_No)

    and DeleteFlag is null

    ) and DeleteFlag is not null

    ) and DeleteFlag is not null

    and CreatedDate >= DATEADD(day, datediff(day, 0, GETDATE()), 0) - 90 and CreatedDate < DATEADD(day, datediff(day, 0, GETDATE()), 0)

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • First question should be how many records of the 12,000,000+ are you deleting?

  • Hi,

    Thanks to all for yours valuable points.

    1. I have taken backup of that CSV_Details tables before doing delete operation

    2. In First query got deleted total 676839 row(s) affected and took - 01:18:06 Hrs.

    3. I am stopped that second delete iteration as below query..

    DeleteFlag is not null and cast(CreatedDate as date) between cast(GETDATE()-180 as date) and cast(GETDATE()-89 as date

    First I will test that script in test setup as suggested to me Mr. RP_DBA and Mr. ScottPletcher then will confirm you..

    Thanks

  • -- Your query looks like this...

    DELETE FROM CSV_Details

    WHERE CSVID NOT IN

    (

    -- level 2

    SELECT CSVID

    FROM CSV_Details d2

    INNER JOIN Transmittal_Planning_MarkNo m2

    ON m2.Mark_No = d2.Mark_Rev_No_fab

    WHERE d2.DeleteFlag IS NOT NULL

    AND NOT EXISTS (

    -- level 1

    select d1.Mark_Rev_No_fab

    from CSV_Details d1 -- not deleted

    INNER JOIN Transmittal_Planning_MarkNo m1

    ON m1.Mark_No = d1.Mark_Rev_No_fab

    where d1.DeleteFlag is null

    and d1.Mark_Rev_No_fab = m2.Mark_No

    -- level 1

    )

    -- level 2

    ) AND DeleteFlag IS NOT NULL

    AND cast(CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)

    -- Which could be as simple as this:

    -- level 2

    DELETE d2

    FROM CSV_Details d2

    WHERE d2.DeleteFlag IS NOT NULL

    AND cast(d2.CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)

    AND EXISTS (

    -- level 1

    SELECT 1

    FROM CSV_Details d1 -- not deleted

    --INNER JOIN Transmittal_Planning_MarkNo m1

    --ON m1.Mark_No = d1.Mark_Rev_No_fab

    WHERE d1.DeleteFlag IS NULL

    AND d1.Mark_Rev_No_fab = d2.Mark_Rev_No_fab

    -- level 1

    )

    -- level 2

    -- Try this instead (this is the "evaluation version" i.e. will it work?):

    ;WITH Deleter AS (

    SELECT

    d.*,

    AnyDeleted = MAX(d.DeleteFlag) OVER(PARTITION BY d.Mark_Rev_No_fab)

    FROM CSV_Details d

    WHERE EXISTS (SELECT 1 FROM Transmittal_Planning_MarkNo m WHERE m.Mark_No = d.Mark_Rev_No_fab)

    AND cast(d.CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)

    )

    SELECT * FROM Deleter

    -- which you would use like this:

    ;WITH Deleter AS (

    SELECT

    d.CSVID,

    DeleteFlag,

    AnyDeleted = MAX(d.DeleteFlag) OVER(PARTITION BY d.Mark_Rev_No_fab)

    FROM CSV_Details d

    WHERE EXISTS (SELECT 1 FROM Transmittal_Planning_MarkNo m WHERE m.Mark_No = d.Mark_Rev_No_fab)

    AND cast(d.CreatedDate as date) between cast(GETDATE()-90 as date) and cast(GETDATE() as date)

    )

    DELETE FROM Deleter WHERE DeleteFlag IS NULL AND AnyDeleted IS NOT NULL

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Forget everything about table structure and table size...

    Just replace NOT IN and IN from your query with EXISTS and NOT EXISTS. You will get good performance in your delete query.

  • I will try to replace NOT IN with LEFT JOIN and filtering out with left column of left table IS NOT NULL. It had helped me in many cases.

  • Walton (9/11/2015)


    I will try to replace NOT IN with LEFT JOIN and filtering out with left column of left table IS NOT NULL. It had helped me in many cases.

    Have you tried the code I posted? There's a SELECT version so you can see what the statement will do without committing to any changes.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply