March 13, 2012 at 9:47 am
I have 2 tables (suppose A and B) with around 1 million rows each. I want to find and delete the records from table A that does not exist in table B.
The Primary key and clustered on both tables is a uniqueidentifier column. And PK of A goes into B as a foreign key.
This query is part of a stored proc. and is taking anywhere from 90 to 120 SECONDS. It always uses clustered index scan on the table A which is consuming most of the time.
I initially had a query which used NOT EXISTS and then I used all other options like NOT IN, LEFT JOIN, TOP. But nothing is helping out.
I need ideas of how can I achieve this in more optimized way.
Here are the schemas
CREATE TABLE A(
[PersonId] [uniqueidentifier] NOT NULL,
[FirstName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NULL,
[MiddleName] [nvarchar](50) NULL,
[Prefix] [nvarchar](20) NULL,
[Suffix] [nvarchar](20) NULL,
[ProfessionalSuffix] [nvarchar](20) NULL,
[SystemEditFlag] [tinyint] NOT NULL,
[EntityVersion] [timestamp] NOT NULL,
CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED
(
[PersonId] ASC
)
)
CREATE TABLE B(
[PersonId] [uniqueidentifier] NOT NULL
--...other columns
CONSTRAINT [PK_B] PRIMARY KEY CLUSTERED
(
[PersonId] ASC
)
)
ALTER TABLE B WITH CHECK ADD CONSTRAINT [FK_B_A] FOREIGN KEY([PersonId])
REFERENCES A ([PersonId])
Thanks in advance.
Regards
Mayur
March 13, 2012 at 9:51 am
Try:
1. Use SELECT INTO to insert records you want to stay into new table using INNER JOIN, IN or EXIST
2. Drop the old table
3. Rename new one to old name
March 13, 2012 at 9:57 am
Thanks for the reply.
But I cannot do that because A is a physical table in the database and all I'm doing is a clean up stuff. I cannot drop and recreate the whole table while there are other mainline transactions taking place.
March 13, 2012 at 10:05 am
Is there a non-clustered index on Table B for personid?
Jared
CE - Microsoft
March 13, 2012 at 10:08 am
mayur birari (3/13/2012)
Thanks for the reply.But I cannot do that because A is a physical table in the database and all I'm doing is a clean up stuff. I cannot drop and recreate the whole table while there are other mainline transactions taking place.
If you do it in a table which is in use and most likely participates in relationships, the way you do it looks right and time taken to process looks fine to me.
Is you clean up one-off thing or you need to run it regularly? If the second, why not to avoid inserting not-needed records at the first place instead of doing regular clean-up (which more appropriate for log-kind tables)
March 13, 2012 at 10:10 am
SQLKnowItAll (3/13/2012)
Is there a non-clustered index on Table B for personid?
It's defined as PRIMARY KEY CLUSTERED in the OP posted DDL...
March 13, 2012 at 10:10 am
Try this is a test environment.
delete from TABLEA
where exists(select Personid from TABLEA except select Personid from TABLEB);
March 13, 2012 at 10:13 am
Eugene Elutin (3/13/2012)
SQLKnowItAll (3/13/2012)
Is there a non-clustered index on Table B for personid?It's defined as PRIMARY KEY CLUSTERED in the OP posted DDL...
I see that the OP did that, but it does not make sense normally (although in cases it does) he referenced it as a foreign key to table A. My assumption is that the OP made an error in his/her DDL.
Jared
CE - Microsoft
March 13, 2012 at 11:14 am
Yes. it does have a foreign key in table B. I agree it may not be needed since it is also a clustered PK on the same table.
I did try Except clause, it has same execution plan and takes same time.
@Eugene
It runs from configurable clean up utility. It runs either every night, every 15 days or once a month. Different for each customer.
And I cannot avoid that because the same foreign key goes into 2 more tables and over the time it can have orphaned records.
March 13, 2012 at 11:20 am
mayur birari (3/13/2012)
Yes.
Yes what?
Have you tested the code I posted?
March 13, 2012 at 11:28 am
mayur birari (3/13/2012)
@SQLKnowItallYes. it does have a foreign key in table B. I agree it may not be needed since it is also a clustered PK on the same table.
I did try Except clause, it has same execution plan and takes same time.
@Eugene
It runs from configurable clean up utility. It runs either every night, every 15 days or once a month. Different for each customer.
And I cannot avoid that because the same foreign key goes into 2 more tables and over the time it can have orphaned records.
No, that's still needed as a constraint if it fits your design. What I would like for you to do is run your delete query(s) without the DELETE. Make them selects. My guess is that it is not the LEFT JOIN or NOT IN, etc. causing the execution time to climb. It is the delete itself. When we purge tables in this fashion, we typically run it in batches of 100 or 1000 for 15 minutes. Of course, this depends on how thr traffic to your databases fluctuates.
Jared
CE - Microsoft
March 13, 2012 at 11:34 am
SQLKnowItAll (3/13/2012)
mayur birari (3/13/2012)
@SQLKnowItallYes. it does have a foreign key in table B. I agree it may not be needed since it is also a clustered PK on the same table.
I did try Except clause, it has same execution plan and takes same time.
@Eugene
It runs from configurable clean up utility. It runs either every night, every 15 days or once a month. Different for each customer.
And I cannot avoid that because the same foreign key goes into 2 more tables and over the time it can have orphaned records.
No, that's still needed as a constraint if it fits your design. What I would like for you to do is run your delete query(s) without the DELETE. Make them selects. My guess is that it is not the LEFT JOIN or NOT IN, etc. causing the execution time to climb. It is the delete itself. When we purge tables in this fashion, we typically run it in batches of 100 or 1000 for 15 minutes. Of course, this depends on how thr traffic to your databases fluctuates.
Based on the above, you could do something like the following:
while (select count(*) from TABLEA except select Personid from TABLEB) > 0
begin
delete top(5000) from TABLEA -- delete 5,000 records at a time (set as needed for your appliction)
where exists(select Personid from TABLEA except select Personid from TABLEB);
waitfor delay '00:00:05'; -- 5 second delay
end
March 13, 2012 at 11:39 am
Rereading your original post I noticed this:
It always uses clustered index scan on the table A which is consuming most of the time.
With an ordered list of Personid's, this is efficient. You are scanning the the index, which is basically the table since the clustered index is declared on Personid.
March 13, 2012 at 11:39 am
Lynn Pettis (3/13/2012)
SQLKnowItAll (3/13/2012)
mayur birari (3/13/2012)
@SQLKnowItallYes. it does have a foreign key in table B. I agree it may not be needed since it is also a clustered PK on the same table.
I did try Except clause, it has same execution plan and takes same time.
@Eugene
It runs from configurable clean up utility. It runs either every night, every 15 days or once a month. Different for each customer.
And I cannot avoid that because the same foreign key goes into 2 more tables and over the time it can have orphaned records.
No, that's still needed as a constraint if it fits your design. What I would like for you to do is run your delete query(s) without the DELETE. Make them selects. My guess is that it is not the LEFT JOIN or NOT IN, etc. causing the execution time to climb. It is the delete itself. When we purge tables in this fashion, we typically run it in batches of 100 or 1000 for 15 minutes. Of course, this depends on how thr traffic to your databases fluctuates.
Based on the above, you could do something like the following:
while (select count(*) from TABLEA except select Personid from TABLEB) > 0
begin
delete top(5000) from TABLEA -- delete 5,000 records at a time (set as needed for your appliction)
where exists(select Personid from TABLEA except select Personid from TABLEB);
waitfor delay '00:00:05'; -- 5 second delay
end
That's what I'm talking about! 🙂 However, if the SELECT itself is slow, we will know that there may be something else going on outside of the delete. Heck, if this thing runs daily and here are only 2 rows to delete daily, then the issue resides outside of the DELETE. I would like to isolate that piece first.
Jared
CE - Microsoft
March 13, 2012 at 11:44 am
Curious, what type of plan does this produce:
select
Personid
from
TABLEA
except
select
Personid
from
TABLEB
)
delete top(5000) from ta
from
TABLEA ta
inner join IdsToDelete itd
on (ta.Personid = itd.Personid)
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply