June 14, 2002 at 8:06 am
Hi,
I work in a place where two of our subsidiaries merged to the parent company. My problem is all companies used same database schema so all tables (& constraints) are the same (as they use Macola they are restricted to using the schema dictated by Macola). After merger, I have duplicate customers (thousands of them)
hence I cannot use the trick I know (group by...having), to make matters worse some of the unique constraints are on composite (5 or more) columns. I have removed the unique constraints and merged the records. Is there a way to somewhat easily eliminate the duplicates so I can rebuild the unique constraints on the merged tables. Please suggest.
Thanks for any help I can get
Raghu
Raghu
June 14, 2002 at 12:31 pm
I wrote a loop that looked for duplcates (using COUNT and HAVING COUNT(*) > 1) by name or other criteria. Then issue a SET ROWCOUNT 1 and a DELETE for each name. Continue looping until you have no more duplcates.
Kludgy and slow (er than set ops), but it worked well.
Steve Jones
June 15, 2002 at 2:44 am
Can you explain more on that raghu, might be post your table schema .
if am not missing anything(though am sure i am:( ) ,you can use a distinct clause and dump the data in a temp table.
or
if therez no pk , you can add a identity column .
and try something like this
select col1,col2,col3,Recono = max(identitycol)
into #temptable
from tablename
group by col1,col2,col3
having count(*) > 1
and
delete from tablename
where identitycol not in (select recono from
#temptable)
later you can drop the temp table and as well the identity column too.
HTH
June 15, 2002 at 7:53 pm
Normally I would suggest create a composite index clustered on UniCol1, UniCol2, UniCol3 (your columns that should be unique) then add an IDENTITY column so you have a number reference. Then use something like
SELECT * FROM tblX oX
WHERE idX IN (SELECT TOP 1 idX FROM tblX iX
WHERE ix.UniCOl1 = ox.UniCol1 AND ix.UniCOl2 = ox.UniCol2 AND ix.UniCOl3 = ox.UniCol3)
This should only return one row per item you will want to test this on known rows to make sure not there. Once you are sure change ...idX IN (SELECT... to idX NOT IN (SELECT... And make sure you get the others instead. If test is fine the change SELECT * FROM tblX... to DELETE tblX... and let run. This will clear the duplicates. Once done drop the IDENTITY column we added and change make you unique column items a PK and once done make sure your index was saved (may have create a non-clustered as index for PK if so delete yours and change PK index to cluserted, not much should happen).
"Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)
June 17, 2002 at 3:26 pm
Thanks for all your suggestions, I can start thinking on each one of them and try it out...I will let you guys know how I made out.
Thanks again
Raghu
Raghu
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply