Faster way of cleaning up a table.

Question

Faster way of cleaning up a table.

Roust_m

SSCoach

Points: 17410
More actions
February 6, 2011 at 9:40 pm

#388791

Hi,
I've got two databases MyDB1 and MyDB2 with some of the tables of the same structure and some are different. I need to clean up Company table in MyDB2, leaving only records that exist in MyDB1. I can't truncate and populate the table from MyDB1 due to the existance of indexed views. I can't drop and re-create indexed views, because they may change in the next version, but the script should not have to be changed.
I use the below code to do the job. "Not In" structure is a bit slow. Is there a better and faster way? Like using some sort of join?
Thanks.
SET ROWCOUNT 2000000
select top 1 * from MyDB1.[dbo].[Company] (nolock) WHERE [CompanyID] not in (
SELECT [CompanyID] FROM MyDB2.[dbo].[Company])
while @@rowcount > 0
begin
delete MyDB1.[dbo].[Company] WHERE [CompanyID] not in (
SELECT [CompanyID] FROM MyDB2.[dbo].[Company])
end
SET ROWCOUNT 0

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 1

I'd probably try to delete fewer rows in every bite but that's basically it. If you have indexed views, they'll be a part of the reason why it goes so slowly... they're update during the deletes, as well.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Roust_m SSCoach Points: 17410 More actions · Answer 2

I am disabling the indexes on indexed views so this is not a problem. The problem is with "NOT IN"

I am trying something like this:

delete

-- select top 10 * from

from MyDB1.[dbo].[Company] c1

left join MyDB2.[dbo].[Company] c2

on c1.CompanyID = c2.CompanyID

WHERE c2.[CompanyID] is NULL

But it gives me a syntax error.

tyson.price SSCrazy Points: 2093 More actions · Answer 3

tyson.price

SSCrazy

Points: 2093

February 7, 2011 at 3:46 am

#1283016

Extra FROM

select top 10 * from

from

Roust_m SSCoach Points: 17410 More actions · Answer 4

That extra "from" is related to commented out select statement.

I worked this out:

delete MyDB1.[dbo].[Company]

-- select top 10 * from

from MyDB1.[dbo].[Company] c1

left join MyDB2.[dbo].[Company] c2

on c1.CompanyID = c2.CompanyID

WHERE c2.[CompanyID] is NULL

The only thing, it does not run any faster compared to "not in" approach. I actually took 50% longer for some reason, perhaps different load. I was expecting the two largest tables to drop to 1 hour from 4 hours, but instead they went up to 6 hours. Strange...

rVadim Hall of Fame Points: 3969 More actions · Answer 5

I wonder if EXCEPT or INTERSECT can be used here.

Just an idea...

--Vadim R.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 6

DECLARE@CurrID INT = (SELECT MIN(PrimaryKeyColumnNameHere) FROM MyDB1.[dbo].[Company]),

@MaxID INT = (SELECT MAX(PrimaryKeyColumnNameHere) FROM MyDB1.[dbo].[Company]),

@Interval INT = 100000

WHILE @CurrID <= @MaxID

BEGIN

DELETEtgt

FROMMyDB1.dbo.Company AS tgt

LEFT JOINMyDB2.dbo.Company AS src ON src.CompanyID = tgt.CompanyID

WHEREtgt.PrimaryKeyColumnNameHere >= @CurrID

AND tgt.PrimaryKeyColumnNameHere < @CurrID + @Interval

AND src.PrimaryKeyColumnNameHere IS NULL

SET@CurrID = @CurrID + @Interval

END

N 56°04'39.16"
E 12°55'05.25"