finding duplicates and show the "original" and the duplicate record

  • There are many duplicate records on my data table because users constantly register under two accounts. I have a query that helps me identify the records that have a duplicate, but it only shows one of the two records, and I need to show the two records so that I can reconcile the differences.

    The query is taken from a post on stack overflow. It gives me 196, but I need to see the 392 records.

    Or if anyone knows how to identify the duplicates and show the tow records without having to hard code any values, so I can use the query in a report, and anytime there are new duplicates, the report shows them.

    SELECT

    [groom_first_name]

    ,[groom_last_name]

    ,[bride_first_name]

    ,[bride_last_name]

    ,[wedding_date]

    ,[contact_email]

    FROM [dbo].[mp_registrations]

    GROUP BY

    [groom_first_name]

    ,[groom_last_name]

    ,[bride_first_name]

    ,[bride_last_name]

    ,[wedding_date]

    ,[contact_email]

    HAVING COUNT(*) > 1

    I hope the way I wrote this is clear. I appreciate your help.

    Thank you.

  • some create table and insert scripts would go a long way to getting you a tested solution, but I would use ROW_NUMBER() to do this and not a totals query.

    Something like:

    SELECT [groom_first_name]

    ,[groom_last_name]

    ,[bride_first_name]

    ,[bride_last_name]

    ,[wedding_date]

    ,[contact_email]

    FROM (

    SELECT

    [groom_first_name]

    ,[groom_last_name]

    ,[bride_first_name]

    ,[bride_last_name]

    ,[wedding_date]

    ,[contact_email]

    ,ROW_NUMBER() OVER (PARTITION BY [contact_email] ORDER BY [contact_email]) AS rn

    FROM [dbo].[mp_registrations]

    ) x

    WHERE x.rn = 1;

    If you want to see the original and all the duplicates, you could remove the filter at the end WHERE x.rn = 1...

  • Quick though, use COUNT with the same OVER clause as the row number (here in pietlinden's code)

    😎

    SELECT [groom_first_name]

    ,[groom_last_name]

    ,[bride_first_name]

    ,[bride_last_name]

    ,[wedding_date]

    ,[contact_email]

    FROM (

    SELECT

    [groom_first_name]

    ,[groom_last_name]

    ,[bride_first_name]

    ,[bride_last_name]

    ,[wedding_date]

    ,[contact_email]

    ,ROW_NUMBER() OVER (PARTITION BY [contact_email] ORDER BY [contact_email]) AS rn

    ,COUNT(*) OVER (PARTITION BY [contact_email]) AS cnt

    FROM [dbo].[mp_registrations]

    ) x

    WHERE x.cnt > 1;

  • (I must be learning! I was close!)

  • pietlinden (11/4/2014)


    (I must be learning! I was close!)

    Almost there;-)

    😎

  • I would take the original query and wrap it in a CTE (common Table Expression) then join the CTE back to the original table. IMHO it makes the code easier to read and to debug because you can run the CTE independatly of the outer query

    WITH CTE AS

    (

    SELECT

    [groom_first_name]

    ,[groom_last_name]

    ,[bride_first_name]

    ,[bride_last_name]

    ,[wedding_date]

    ,[contact_email]

    FROM [dbo].[mp_registrations]

    GROUP BY

    [groom_first_name]

    ,[groom_last_name]

    ,[bride_first_name]

    ,[bride_last_name]

    ,[wedding_date]

    ,[contact_email]

    HAVING COUNT(*) > 1

    )

    SELECT

    R.[groom_first_name]

    ,R.[groom_last_name]

    ,R.[bride_first_name]

    ,R.[bride_last_name]

    ,R.[wedding_date]

    ,R.[contact_email]

    FROM [dbo].[mp_registrations] R

    JOIN CTE ON

    CTE.[groom_first_name] = R.[groom_first_name]

    AND CTE.[groom_last_name] = R.[groom_last_name]

    AND CTE.[bride_first_name] = R.[bride_first_name]

    AND CTE.[bride_last_name] = R.[bride_last_name]

    AND CTE.[wedding_date] = R.[wedding_date]

    AND CTE.[contact_email] = R.[contact_email]

    ORDER BY

    R.[groom_first_name]

    ,R.[groom_last_name]

    ,R.[bride_first_name]

    ,R.[bride_last_name]

    ,R.[wedding_date]

    ,R.[contact_email]

    This will give you all the duplicate records but won't let you identify the 'Master' record. If you want to designate one record to keep you will need to use another unique field or implement a RANK() or ROW_NUMER() as already described

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply