Deleting Duplicate Data

Question

Post reply

Deleting Duplicate Data

Brian Brant

SSC Eights!

Points: 839
More actions
July 22, 2005 at 10:00 am

#88214

This is probably a simple question, but how do I delete duplicate data in a single table.
I used this query to identify the duplicates but how do I safely remove the "duplicate" without deleting the "original" so to speak.
SELECT (tblVDN_BUCKET.VDN) AS [VDN Field], Count(tblVDN_BUCKET.VDN) AS NumberOfDups
FROM tblVDN_BUCKET
GROUP BY tblVDN_BUCKET.VDN
HAVING (((Count(tblVDN_BUCKET.VDN))>1))
Thanks,
Brian

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Greg Knaddison SSC Eights! Points: 859 More actions · Answer 1

If the entire row is duplicate you can do:

select distinct * into tblVDN_BUCKET2 from tblVDN_BUCKET

drop table tblVDN_BUCKET

Be sure to drop/create keys/indexes/etc.

If you have another unique key field, or could add an identity UniqueKey, you could also do:

delete tblVDN_BUCKET

from tblVDN_BUCKET a left outer join

(SELECT tblVDN_BUCKET.VDN, min(UniqueKey)

FROM tblVDN_BUCKET

GROUP BY tblVDN_BUCKET.VDN

HAVING Count(1)>1) b a.vdn = b.vdn and a.UniqueKey = b.UniqueKey

where b.UniqueKey is null

Ray M SSC-Insane Points: 21093 More actions · Answer 2

Create table Dups (pk int identity, Foo char(1))

Insert into Dups (foo)

Select 'a'

union all

Select 'b'

union all

Select 'c'

union all

Select 'd'

union all

Select 'e'

union all

Select 'a'

union all

Select 'c'

union all

Select 'e'

-- Shows each combination of a duplicate

select *

from dups a

join dups b on a.foo = b.foo

where a.pk <> b.pk

-- Get the max pk of each dup

select max(a.pk)

from dups a

join dups b on a.foo = b.foo

where a.pk <> b.pk

group by a.foo

-- Put in in clause of delete statement

Delete Dups

where pk in (-- Get the max pk of each dup

select max(a.pk)

from dups a

join dups b on a.foo = b.foo

where a.pk <> b.pk

group by a.foo

)

select * from dups

drop table dups

noeld SSC Guru Points: 96590 More actions · Answer 3

Unfortunately this query

Delete Dups

where pk in (-- Get the max pk of each dup

select max(a.pk)

from dups a

join dups b on a.foo = b.foo

where a.pk <> b.pk

group by a.foo

)

assumes that you will have only 2 repeated records for 3 or more you will need to repeat the statement several times

* Noel

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 4

Or use "not in" instead of "in"

My suggestion would be the following:

delete dups where pk != (select min(dups2.pk) from dups dups2 where dups2.foo = dups.foo)

but only because I think it is easier to read