DUPLICATE RECORDS

Question

Post reply

DUPLICATE RECORDS

skrishnan

SSC Rookie

Points: 47
More actions
September 3, 2003 at 6:35 am

#358238

I have a table containing about 2500000 rows out of this i have found that 750 rows have been duplicated thats 1500 rows, is there a possibility to delete the duplicated 750 rows only
Thanks in advance

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

erajendar SSC Eights! Points: 840 More actions · Answer 1

Begin Transaction

Create table #temp(inculde the columns which are in Source table)

Insert into #temp values Select distinct * from

source table

delete from source table

Insert into source table select * from #temp

if @@error=0

commit transaction

else

rollback transaction

drop table #temp

erajendar

Crispin Proctor SSCoach Points: 18161 More actions · Answer 2

Here's a script by Chris Cubley which I have used a couple times.

http://www.sqlservercentral.com/scripts/contributions/395.asp

Cheers,

Crispin

Something as incredibly simple as

binary still gives you too many options

Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

cppwiz SSCarpal Tunnel Points: 4036 More actions · Answer 3

Chris Cubley also wrote an excellent article on this topic that includes the scripts you will need to identify and remove the duplicate records. I tested the scripts on my own system and everything works A-OK !!!

Here's the article:

http://www.sqlservercentral.com/columnists/ccubley/findinganddeletingduplicatedata.asp

Greg Larsen SSC-Insane Points: 20961 More actions · Answer 4

You can also add an identity column to the table to help with deleting duplicates. Here is an example of some code to do that:

alter table dup_table

add seq_num int identity

go

delete from a

from dup_table a join

(select col1, col2, max(seq_num) max_seq_num from dup_table

group by col1, col2

having count(*) > 1) b

on a.col1 = b.col1 and

a.col2 = b.col2 and

a.seq_num < b.max_seq_num

go

alter table dup_table

drop column seq_num

Gregory Larsen, DBA

If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

Gregory A. Larsen, MVP

skrishnan SSC Rookie Points: 47 More actions · Answer 5

Many Thanks to all you Guys for your help

Special Thanks Greg, your Solution did the trick.

Thanks once again.

rstone SSCertifiable Points: 6213 More actions · Answer 6

If you can't touch the table structure, a loop using SET ROWCOUNT will limit the number of records deleted. Perhaps a cursor with using a select something like

SELECT Col1, Col2, Col3, Count(*) - 1

FROM TheTableWithDups

GROUP BY @Col1, @Col2, @Col3

HAVING COUNT(*) > 1

Fetch into @Col1, @Col2, @Col3, @NumberToDelete

In the cursor loop...

BEGIN TRAN -- make sure the count is and stays correct or bad things could happen!

SELECT @NumberToDelete = COUNT(*) - 1

FROM TheTableWithDups

WHERE Col1 = @Col1 AND

Col2 = @Col2 AND

Col3 = @Col3

IF @NumberToDelete > 0

BEGIN

SET ROWCOUNT = @NumberToDelete

DELETE FROM TheTableWithDups

WHERE Col1 = @Col1 AND

Col2 = @Col2 AND

Col3 = @Col3

END

COMMIT

I have not tried this, but I think it would work. If there is no chance that the table will be changed (perhaps via a table lock), then no extra check or transaction would be required.

RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉