November 3, 2014 at 9:09 pm
There are many duplicate records on my data table because users constantly register under two accounts. I have a query that helps me identify the records that have a duplicate, but it only shows one of the two records, and I need to show the two records so that I can reconcile the differences.
The query is taken from a post on stack overflow. It gives me 196, but I need to see the 392 records.
Or if anyone knows how to identify the duplicates and show the tow records without having to hard code any values, so I can use the query in a report, and anytime there are new duplicates, the report shows them.
SELECT
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
FROM [dbo].[mp_registrations]
GROUP BY
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
HAVING COUNT(*) > 1
I hope the way I wrote this is clear. I appreciate your help.
Thank you.
November 3, 2014 at 9:27 pm
some create table and insert scripts would go a long way to getting you a tested solution, but I would use ROW_NUMBER() to do this and not a totals query.
Something like:
SELECT [groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
FROM (
SELECT
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
,ROW_NUMBER() OVER (PARTITION BY [contact_email] ORDER BY [contact_email]) AS rn
FROM [dbo].[mp_registrations]
) x
WHERE x.rn = 1;
If you want to see the original and all the duplicates, you could remove the filter at the end WHERE x.rn = 1...
November 3, 2014 at 10:01 pm
Quick though, use COUNT with the same OVER clause as the row number (here in pietlinden's code)
😎
SELECT [groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
FROM (
SELECT
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
,ROW_NUMBER() OVER (PARTITION BY [contact_email] ORDER BY [contact_email]) AS rn
,COUNT(*) OVER (PARTITION BY [contact_email]) AS cnt
FROM [dbo].[mp_registrations]
) x
WHERE x.cnt > 1;
November 4, 2014 at 12:43 pm
(I must be learning! I was close!)
November 4, 2014 at 1:26 pm
pietlinden (11/4/2014)
(I must be learning! I was close!)
Almost there;-)
😎
November 5, 2014 at 3:38 am
I would take the original query and wrap it in a CTE (common Table Expression) then join the CTE back to the original table. IMHO it makes the code easier to read and to debug because you can run the CTE independatly of the outer query
WITH CTE AS
(
SELECT
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
FROM [dbo].[mp_registrations]
GROUP BY
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
,[wedding_date]
,[contact_email]
HAVING COUNT(*) > 1
)
SELECT
R.[groom_first_name]
,R.[groom_last_name]
,R.[bride_first_name]
,R.[bride_last_name]
,R.[wedding_date]
,R.[contact_email]
FROM [dbo].[mp_registrations] R
JOIN CTE ON
CTE.[groom_first_name] = R.[groom_first_name]
AND CTE.[groom_last_name] = R.[groom_last_name]
AND CTE.[bride_first_name] = R.[bride_first_name]
AND CTE.[bride_last_name] = R.[bride_last_name]
AND CTE.[wedding_date] = R.[wedding_date]
AND CTE.[contact_email] = R.[contact_email]
ORDER BY
R.[groom_first_name]
,R.[groom_last_name]
,R.[bride_first_name]
,R.[bride_last_name]
,R.[wedding_date]
,R.[contact_email]
This will give you all the duplicate records but won't let you identify the 'Master' record. If you want to designate one record to keep you will need to use another unique field or implement a RANK() or ROW_NUMER() as already described
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply