Removing duplicates

Question

Removing duplicates

raghu_nandan1

SSC Eights!

Points: 966
More actions
June 14, 2002 at 8:06 am

#78823

Hi,
I work in a place where two of our subsidiaries merged to the parent company. My problem is all companies used same database schema so all tables (& constraints) are the same (as they use Macola they are restricted to using the schema dictated by Macola). After merger, I have duplicate customers (thousands of them)
hence I cannot use the trick I know (group by...having), to make matters worse some of the unique constraints are on composite (5 or more) columns. I have removed the unique constraints and merged the records. Is there a way to somewhat easily eliminate the duplicates so I can rebuild the unique constraints on the merged tables. Please suggest.
Thanks for any help I can get
Raghu

Raghu

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 741245 More actions · Answer 1

I wrote a loop that looked for duplcates (using COUNT and HAVING COUNT(*) > 1) by name or other criteria. Then issue a SET ROWCOUNT 1 and a DELETE for each name. Continue looping until you have no more duplcates.

Kludgy and slow (er than set ops), but it worked well.

Steve Jones

steve@dkranch.net

Nazim Say Hey Kid Points: 694 More actions · Answer 2

Can you explain more on that raghu, might be post your table schema .

if am not missing anything(though am sure i am:( ) ,you can use a distinct clause and dump the data in a temp table.

or

if therez no pk , you can add a identity column .

and try something like this

select col1,col2,col3,Recono = max(identitycol)

into #temptable

from tablename

group by col1,col2,col3

having count(*) > 1

and

delete from tablename

where identitycol not in (select recono from

#temptable)

later you can drop the temp table and as well the identity column too.

HTH

Antares686 SSC Guru Points: 125444 More actions · Answer 3

Normally I would suggest create a composite index clustered on UniCol1, UniCol2, UniCol3 (your columns that should be unique) then add an IDENTITY column so you have a number reference. Then use something like

SELECT * FROM tblX oX

WHERE idX IN (SELECT TOP 1 idX FROM tblX iX

WHERE ix.UniCOl1 = ox.UniCol1 AND ix.UniCOl2 = ox.UniCol2 AND ix.UniCOl3 = ox.UniCol3)

This should only return one row per item you will want to test this on known rows to make sure not there. Once you are sure change ...idX IN (SELECT... to idX NOT IN (SELECT... And make sure you get the others instead. If test is fine the change SELECT * FROM tblX... to DELETE tblX... and let run. This will clear the duplicates. Once done drop the IDENTITY column we added and change make you unique column items a PK and once done make sure your index was saved (may have create a non-clustered as index for PK if so delete yours and change PK index to cluserted, not much should happen).

"Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

raghu_nandan1 SSC Eights! Points: 966 More actions · Answer 4

Thanks for all your suggestions, I can start thinking on each one of them and try it out...I will let you guys know how I made out.

Thanks again

Raghu