November 5, 2009 at 5:38 am
Hi
I have a table with approximately 3.5 million rows in it. I am trying to create a CLUSTERED PK on the table, but there are duplicate rows.
Is there a quicker way to find out what the duplicates are than the code below:CREATE TABLE NonUnique
(
ColA VARCHAR(4) NULL,
ColB VARCHAR(5) NULL,
ColC DATETIME NULL,
ColD INT NULL,
ColE FLOAT(53),
ColF VARCHAR(5)
)
INSERT INTO NonUnique
(
[ColA],
[ColB],
[ColC],
[ColD],
[ColE],
[ColF]
)
SELECT 'ABC1','BBC11',GETDATE(),1,1.11,'A' UNION ALL
SELECT 'ABC2','BBC22',GETDATE(),2,2.22,'B' UNION ALL
SELECT 'ABC3','BBC33',GETDATE(),3,3.33,'C' UNION ALL
SELECT 'ABC4','BBC44',GETDATE(),4,4.44,'D' UNION ALL
SELECT 'ABC5','BBC55',GETDATE(),5,5.55,'E' UNION ALL
SELECT 'ABC7','BBC55',GETDATE(),6,6.66,'F' UNION ALL
SELECT 'ABC7','BBC77','2009-11-05 12:46:05.010',7,7.77,'G' UNION ALL
SELECT 'ABC7','BBC77','2009-11-05 12:46:05.010',8,8.88,'H'
SELECT *
FROM NonUnique
WHERE
cola IN (SELECT cola FROM NonUnique GROUP BY cola, colb, colc HAVING COUNT(*) >1) AND
colb IN (SELECT colb FROM NonUnique GROUP BY cola, colb, colc HAVING COUNT(*) >1) AND
colc IN (SELECT colc FROM NonUnique GROUP BY cola, colb, colc HAVING COUNT(*) >1)
As you can see, I have to query all the columns that I want to use as a PK individually. Is there a way to avoid these subqueries?
Thank you
November 5, 2009 at 6:23 am
If you only want to know the keys of the duplicate rows, then you can do:-
SELECT cola, colb, colc, count(*)
FROM NonUnique
GROUP BY cola, colb, colc
HAVING count(*) > 1
November 5, 2009 at 6:36 am
Thanks Ian, it appears that this is what I need.
November 6, 2009 at 12:51 am
Thanks Ian, it *is* actually exactly what I needed.
November 6, 2009 at 3:09 am
An alternative, mostly because I'm a bit bored:
SELECT ColA, ColB, colC
FROM (
SELECT ColA, ColB, colC,
rn = ROW_NUMBER() OVER (PARTITION BY ColA, ColB, ColC ORDER BY ColA)
FROM dbo.NonUnique
) T1
WHERE rn = 2
The estimated query plan has this very slightly cheaper than Ian's much more natural solution...interesting.
It is slightly different, in that it just produces one set of keys per duplicate.
Paul
November 9, 2009 at 3:34 am
Is it vailed?
select cola,colb,colc ,count(*)from NonUnique as e
where (select Count(* )from nonunique as e1 where e1.cola=e.cola and e1.colb=e.colb and e1.colc=e.colc )>1
group by cola,colb,colc
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply