A Query to Show Duplicate or Non Distinct Rows

  • Hi

    I have a table with approximately 3.5 million rows in it. I am trying to create a CLUSTERED PK on the table, but there are duplicate rows.

    Is there a quicker way to find out what the duplicates are than the code below:CREATE TABLE NonUnique

    (

    ColA VARCHAR(4) NULL,

    ColB VARCHAR(5) NULL,

    ColC DATETIME NULL,

    ColD INT NULL,

    ColE FLOAT(53),

    ColF VARCHAR(5)

    )

    INSERT INTO NonUnique

    (

    [ColA],

    [ColB],

    [ColC],

    [ColD],

    [ColE],

    [ColF]

    )

    SELECT 'ABC1','BBC11',GETDATE(),1,1.11,'A' UNION ALL

    SELECT 'ABC2','BBC22',GETDATE(),2,2.22,'B' UNION ALL

    SELECT 'ABC3','BBC33',GETDATE(),3,3.33,'C' UNION ALL

    SELECT 'ABC4','BBC44',GETDATE(),4,4.44,'D' UNION ALL

    SELECT 'ABC5','BBC55',GETDATE(),5,5.55,'E' UNION ALL

    SELECT 'ABC7','BBC55',GETDATE(),6,6.66,'F' UNION ALL

    SELECT 'ABC7','BBC77','2009-11-05 12:46:05.010',7,7.77,'G' UNION ALL

    SELECT 'ABC7','BBC77','2009-11-05 12:46:05.010',8,8.88,'H'

    SELECT *

    FROM NonUnique

    WHERE

    cola IN (SELECT cola FROM NonUnique GROUP BY cola, colb, colc HAVING COUNT(*) >1) AND

    colb IN (SELECT colb FROM NonUnique GROUP BY cola, colb, colc HAVING COUNT(*) >1) AND

    colc IN (SELECT colc FROM NonUnique GROUP BY cola, colb, colc HAVING COUNT(*) >1)

    As you can see, I have to query all the columns that I want to use as a PK individually. Is there a way to avoid these subqueries?

    Thank you

  • If you only want to know the keys of the duplicate rows, then you can do:-

    SELECT cola, colb, colc, count(*)

    FROM NonUnique

    GROUP BY cola, colb, colc

    HAVING count(*) > 1

  • Thanks Ian, it appears that this is what I need.

  • Thanks Ian, it *is* actually exactly what I needed.

  • An alternative, mostly because I'm a bit bored:

    SELECT ColA, ColB, colC

    FROM (

    SELECT ColA, ColB, colC,

    rn = ROW_NUMBER() OVER (PARTITION BY ColA, ColB, ColC ORDER BY ColA)

    FROM dbo.NonUnique

    ) T1

    WHERE rn = 2

    The estimated query plan has this very slightly cheaper than Ian's much more natural solution...interesting.

    It is slightly different, in that it just produces one set of keys per duplicate.

    Paul

  • Is it vailed?

    select cola,colb,colc ,count(*)from NonUnique as e

    where (select Count(* )from nonunique as e1 where e1.cola=e.cola and e1.colb=e.colb and e1.colc=e.colc )>1

    group by cola,colb,colc

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply