Need ideas on query performance

  • I have 2 tables (suppose A and B) with around 1 million rows each. I want to find and delete the records from table A that does not exist in table B.

    The Primary key and clustered on both tables is a uniqueidentifier column. And PK of A goes into B as a foreign key.

    This query is part of a stored proc. and is taking anywhere from 90 to 120 SECONDS. It always uses clustered index scan on the table A which is consuming most of the time.

    I initially had a query which used NOT EXISTS and then I used all other options like NOT IN, LEFT JOIN, TOP. But nothing is helping out.

    I need ideas of how can I achieve this in more optimized way.

    Here are the schemas

    CREATE TABLE A(

    [PersonId] [uniqueidentifier] NOT NULL,

    [FirstName] [nvarchar](50) NULL,

    [LastName] [nvarchar](50) NULL,

    [MiddleName] [nvarchar](50) NULL,

    [Prefix] [nvarchar](20) NULL,

    [Suffix] [nvarchar](20) NULL,

    [ProfessionalSuffix] [nvarchar](20) NULL,

    [SystemEditFlag] [tinyint] NOT NULL,

    [EntityVersion] [timestamp] NOT NULL,

    CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED

    (

    [PersonId] ASC

    )

    )

    CREATE TABLE B(

    [PersonId] [uniqueidentifier] NOT NULL

    --...other columns

    CONSTRAINT [PK_B] PRIMARY KEY CLUSTERED

    (

    [PersonId] ASC

    )

    )

    ALTER TABLE B WITH CHECK ADD CONSTRAINT [FK_B_A] FOREIGN KEY([PersonId])

    REFERENCES A ([PersonId])

    Thanks in advance.

    Regards

    Mayur

  • Try:

    1. Use SELECT INTO to insert records you want to stay into new table using INNER JOIN, IN or EXIST

    2. Drop the old table

    3. Rename new one to old name

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks for the reply.

    But I cannot do that because A is a physical table in the database and all I'm doing is a clean up stuff. I cannot drop and recreate the whole table while there are other mainline transactions taking place.

  • Is there a non-clustered index on Table B for personid?

    Jared
    CE - Microsoft

  • mayur birari (3/13/2012)


    Thanks for the reply.

    But I cannot do that because A is a physical table in the database and all I'm doing is a clean up stuff. I cannot drop and recreate the whole table while there are other mainline transactions taking place.

    If you do it in a table which is in use and most likely participates in relationships, the way you do it looks right and time taken to process looks fine to me.

    Is you clean up one-off thing or you need to run it regularly? If the second, why not to avoid inserting not-needed records at the first place instead of doing regular clean-up (which more appropriate for log-kind tables)

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • SQLKnowItAll (3/13/2012)


    Is there a non-clustered index on Table B for personid?

    It's defined as PRIMARY KEY CLUSTERED in the OP posted DDL...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Try this is a test environment.

    delete from TABLEA

    where exists(select Personid from TABLEA except select Personid from TABLEB);

  • Eugene Elutin (3/13/2012)


    SQLKnowItAll (3/13/2012)


    Is there a non-clustered index on Table B for personid?

    It's defined as PRIMARY KEY CLUSTERED in the OP posted DDL...

    I see that the OP did that, but it does not make sense normally (although in cases it does) he referenced it as a foreign key to table A. My assumption is that the OP made an error in his/her DDL.

    Jared
    CE - Microsoft

  • @sqlknowitall

    Yes. it does have a foreign key in table B. I agree it may not be needed since it is also a clustered PK on the same table.

    @Lynn

    I did try Except clause, it has same execution plan and takes same time.

    @Eugene

    It runs from configurable clean up utility. It runs either every night, every 15 days or once a month. Different for each customer.

    And I cannot avoid that because the same foreign key goes into 2 more tables and over the time it can have orphaned records.

  • mayur birari (3/13/2012)


    Yes.

    Yes what?

    Have you tested the code I posted?

  • mayur birari (3/13/2012)


    @SQLKnowItall

    Yes. it does have a foreign key in table B. I agree it may not be needed since it is also a clustered PK on the same table.

    @Lynn

    I did try Except clause, it has same execution plan and takes same time.

    @Eugene

    It runs from configurable clean up utility. It runs either every night, every 15 days or once a month. Different for each customer.

    And I cannot avoid that because the same foreign key goes into 2 more tables and over the time it can have orphaned records.

    No, that's still needed as a constraint if it fits your design. What I would like for you to do is run your delete query(s) without the DELETE. Make them selects. My guess is that it is not the LEFT JOIN or NOT IN, etc. causing the execution time to climb. It is the delete itself. When we purge tables in this fashion, we typically run it in batches of 100 or 1000 for 15 minutes. Of course, this depends on how thr traffic to your databases fluctuates.

    Jared
    CE - Microsoft

  • SQLKnowItAll (3/13/2012)


    mayur birari (3/13/2012)


    @SQLKnowItall

    Yes. it does have a foreign key in table B. I agree it may not be needed since it is also a clustered PK on the same table.

    @Lynn

    I did try Except clause, it has same execution plan and takes same time.

    @Eugene

    It runs from configurable clean up utility. It runs either every night, every 15 days or once a month. Different for each customer.

    And I cannot avoid that because the same foreign key goes into 2 more tables and over the time it can have orphaned records.

    No, that's still needed as a constraint if it fits your design. What I would like for you to do is run your delete query(s) without the DELETE. Make them selects. My guess is that it is not the LEFT JOIN or NOT IN, etc. causing the execution time to climb. It is the delete itself. When we purge tables in this fashion, we typically run it in batches of 100 or 1000 for 15 minutes. Of course, this depends on how thr traffic to your databases fluctuates.

    Based on the above, you could do something like the following:

    while (select count(*) from TABLEA except select Personid from TABLEB) > 0

    begin

    delete top(5000) from TABLEA -- delete 5,000 records at a time (set as needed for your appliction)

    where exists(select Personid from TABLEA except select Personid from TABLEB);

    waitfor delay '00:00:05'; -- 5 second delay

    end

  • Rereading your original post I noticed this:

    It always uses clustered index scan on the table A which is consuming most of the time.

    With an ordered list of Personid's, this is efficient. You are scanning the the index, which is basically the table since the clustered index is declared on Personid.

  • Lynn Pettis (3/13/2012)


    SQLKnowItAll (3/13/2012)


    mayur birari (3/13/2012)


    @SQLKnowItall

    Yes. it does have a foreign key in table B. I agree it may not be needed since it is also a clustered PK on the same table.

    @Lynn

    I did try Except clause, it has same execution plan and takes same time.

    @Eugene

    It runs from configurable clean up utility. It runs either every night, every 15 days or once a month. Different for each customer.

    And I cannot avoid that because the same foreign key goes into 2 more tables and over the time it can have orphaned records.

    No, that's still needed as a constraint if it fits your design. What I would like for you to do is run your delete query(s) without the DELETE. Make them selects. My guess is that it is not the LEFT JOIN or NOT IN, etc. causing the execution time to climb. It is the delete itself. When we purge tables in this fashion, we typically run it in batches of 100 or 1000 for 15 minutes. Of course, this depends on how thr traffic to your databases fluctuates.

    Based on the above, you could do something like the following:

    while (select count(*) from TABLEA except select Personid from TABLEB) > 0

    begin

    delete top(5000) from TABLEA -- delete 5,000 records at a time (set as needed for your appliction)

    where exists(select Personid from TABLEA except select Personid from TABLEB);

    waitfor delay '00:00:05'; -- 5 second delay

    end

    That's what I'm talking about! 🙂 However, if the SELECT itself is slow, we will know that there may be something else going on outside of the delete. Heck, if this thing runs daily and here are only 2 rows to delete daily, then the issue resides outside of the DELETE. I would like to isolate that piece first.

    Jared
    CE - Microsoft

  • Curious, what type of plan does this produce:

    select

    Personid

    from

    TABLEA

    except

    select

    Personid

    from

    TABLEB

    )

    delete top(5000) from ta

    from

    TABLEA ta

    inner join IdsToDelete itd

    on (ta.Personid = itd.Personid)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply