finding duplicates and show the "original" and the duplicate record

Question

finding duplicates and show the "original" and the duplicate record

itortu

SSCommitted

Points: 1914
More actions
November 3, 2014 at 9:09 pm

#292176

There are many duplicate records on my data table because users constantly register under two accounts. I have a query that helps me identify the records that have a duplicate, but it only shows one of the two records, and I need to show the two records so that I can reconcile the differences.
The query is taken from a post on stack overflow. It gives me 196, but I need to see the 392 records.
Or if anyone knows how to identify the duplicates and show the tow records without having to hard code any values, so I can use the query in a report, and anytime there are new duplicates, the report shows them.
SELECT
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
FROM [dbo].[mp_registrations]
GROUP BY
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
HAVING COUNT(*) > 1
I hope the way I wrote this is clear. I appreciate your help.
Thank you.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63503 More actions · Answer 1

some create table and insert scripts would go a long way to getting you a tested solution, but I would use ROW_NUMBER() to do this and not a totals query.

Something like:

SELECT [groom_first_name]

,[groom_last_name]

,[bride_first_name]

,[bride_last_name]

,[wedding_date]

,[contact_email]

FROM (

SELECT

[groom_first_name]

,[groom_last_name]

,[bride_first_name]

,[bride_last_name]

,[wedding_date]

,[contact_email]

,ROW_NUMBER() OVER (PARTITION BY [contact_email] ORDER BY [contact_email]) AS rn

FROM [dbo].[mp_registrations]

) x

WHERE x.rn = 1;

If you want to see the original and all the duplicates, you could remove the filter at the end WHERE x.rn = 1...

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 2

Quick though, use COUNT with the same OVER clause as the row number (here in pietlinden's code)

😎

SELECT [groom_first_name]

,[groom_last_name]

,[bride_first_name]

,[bride_last_name]

,[wedding_date]

,[contact_email]

FROM (

SELECT

[groom_first_name]

,[groom_last_name]

,[bride_first_name]

,[bride_last_name]

,[wedding_date]

,[contact_email]

,ROW_NUMBER() OVER (PARTITION BY [contact_email] ORDER BY [contact_email]) AS rn

,COUNT(*) OVER (PARTITION BY [contact_email]) AS cnt

FROM [dbo].[mp_registrations]

) x

WHERE x.cnt > 1;

pietlinden SSC Guru Points: 63503 More actions · Answer 3

pietlinden

SSC Guru

Points: 63503

November 4, 2014 at 12:43 pm

#1756602

(I must be learning! I was close!)

Eirikur Eiriksson SSC Guru Points: 182911 More actions · Answer 4

pietlinden (11/4/2014)
(I must be learning! I was close!)

Almost there;-)

😎

aaron.reese SSChampion Points: 13498 More actions · Answer 5

I would take the original query and wrap it in a CTE (common Table Expression) then join the CTE back to the original table. IMHO it makes the code easier to read and to debug because you can run the CTE independatly of the outer query

WITH CTE AS

(

SELECT

[groom_first_name]

,[groom_last_name]

,[bride_first_name]

,[bride_last_name]

,[wedding_date]

,[contact_email]

FROM [dbo].[mp_registrations]

GROUP BY

[groom_first_name]

,[groom_last_name]

,[bride_first_name]

,[bride_last_name]

,[wedding_date]

,[contact_email]

HAVING COUNT(*) > 1

)

SELECT

R.[groom_first_name]

,R.[groom_last_name]

,R.[bride_first_name]

,R.[bride_last_name]

,R.[wedding_date]

,R.[contact_email]

FROM [dbo].[mp_registrations] R

JOIN CTE ON

CTE.[groom_first_name] = R.[groom_first_name]

AND CTE.[groom_last_name] = R.[groom_last_name]

AND CTE.[bride_first_name] = R.[bride_first_name]

AND CTE.[bride_last_name] = R.[bride_last_name]

AND CTE.[wedding_date] = R.[wedding_date]

AND CTE.[contact_email] = R.[contact_email]

ORDER BY

R.[groom_first_name]

,R.[groom_last_name]

,R.[bride_first_name]

,R.[bride_last_name]

,R.[wedding_date]

,R.[contact_email]

This will give you all the duplicate records but won't let you identify the 'Master' record. If you want to designate one record to keep you will need to use another unique field or implement a RANK() or ROW_NUMER() as already described