July 22, 2005 at 10:00 am
This is probably a simple question, but how do I delete duplicate data in a single table.
I used this query to identify the duplicates but how do I safely remove the "duplicate" without deleting the "original" so to speak.
SELECT (tblVDN_BUCKET.VDN) AS [VDN Field], Count(tblVDN_BUCKET.VDN) AS NumberOfDups
FROM tblVDN_BUCKET
GROUP BY tblVDN_BUCKET.VDN
HAVING (((Count(tblVDN_BUCKET.VDN))>1))
Thanks,
Brian
July 22, 2005 at 10:06 am
If the entire row is duplicate you can do:
select distinct * into tblVDN_BUCKET2 from tblVDN_BUCKET
drop table tblVDN_BUCKET
Be sure to drop/create keys/indexes/etc.
If you have another unique key field, or could add an identity UniqueKey, you could also do:
delete tblVDN_BUCKET
from tblVDN_BUCKET a left outer join
(SELECT tblVDN_BUCKET.VDN, min(UniqueKey)
FROM tblVDN_BUCKET
GROUP BY tblVDN_BUCKET.VDN
HAVING Count(1)>1) b a.vdn = b.vdn and a.UniqueKey = b.UniqueKey
where b.UniqueKey is null
July 22, 2005 at 10:07 am
Create table Dups (pk int identity, Foo char(1))
Insert into Dups (foo)
Select 'a'
union all
Select 'b'
union all
Select 'c'
union all
Select 'd'
union all
Select 'e'
union all
Select 'a'
union all
Select 'c'
union all
Select 'e'
-- Shows each combination of a duplicate
select *
from dups a
join dups b on a.foo = b.foo
where a.pk <> b.pk
-- Get the max pk of each dup
select max(a.pk)
from dups a
join dups b on a.foo = b.foo
where a.pk <> b.pk
group by a.foo
-- Put in in clause of delete statement
Delete Dups
where pk in (-- Get the max pk of each dup
select max(a.pk)
from dups a
join dups b on a.foo = b.foo
where a.pk <> b.pk
group by a.foo
)
select * from dups
drop table dups
July 22, 2005 at 12:31 pm
Unfortunately this query
Delete Dups
where pk in (-- Get the max pk of each dup
select max(a.pk)
from dups a
join dups b on a.foo = b.foo
where a.pk <> b.pk
group by a.foo
)
assumes that you will have only 2 repeated records for 3 or more you will need to repeat the statement several times
* Noel
July 25, 2005 at 1:23 am
Or use "not in" instead of "in"
My suggestion would be the following:
delete dups where pk != (select min(dups2.pk) from dups dups2 where dups2.foo = dups.foo)
but only because I think it is easier to read
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply