Duplicate Names, but Not Duplicate IDs

Question

Duplicate Names, but Not Duplicate IDs

AndrewSQLDBA

SSC-Insane

Points: 21827
More actions
February 1, 2013 at 12:27 pm

#270182

Hello Everyone
I am working on way to show the Duplicates along with the Number Of Duplicates.
I have a table with Member Data, there are no, one and many duplicates. But the Member ID is not the Duplicate, it is the Member Name. Lets say to giggles, the name belongs to only one single person.
I would like to list each Member Name, along with each distinct MemberID that is duplicated in the table. I am not concerned about the Members with only one row.
I have tried using a CTE with the ROW_NUMBER(), and tried using the HAVING clause with a count, but I am missing something, and I think that it is really something simple. I am getting a list of Member names, a single MemberID and the Count if the Member has more than one row.
DECLARE @DuplicateMembers TABLE
(
MemberID int
, FirstName varchar(25)
, LastName varchar(25)
)
INSERT INTO @DuplicateMembers
SELECT 4663632, 'Jimmy', 'Smith' UNION ALL
SELECT 2669133, 'Jimmy', 'Smith' UNION ALL
SELECT 393382, 'Jimmy', 'Smith' UNION ALL
SELECT 2758596, 'Jimmy', 'Smith' UNION ALL
SELECT 168701, 'Jimmy', 'Smith' UNION ALL
SELECT 4394537, 'Jimmy', 'Smith' UNION ALL
SELECT 1349209, 'Sammy', 'Jones' UNION ALL
SELECT 2284743, 'Sammy', 'Jones' UNION ALL
SELECT 3907363, 'Sammy', 'Jones' UNION ALL
SELECT 7685349, 'Danny', 'Rowe' UNION ALL
SELECT 2770663, 'Danny', 'Rowe' UNION ALL
SELECT 320142, 'Danny', 'Rowe' UNION ALL
SELECT 320142, 'Gene', 'Miller' UNION ALL
SELECT 1000158919, 'Danny', 'Rowe' UNION ALL
SELECT 871838, 'Danny', 'Rowe' UNION ALL
SELECT 1000234650, 'Danny', 'Rowe' UNION ALL
SELECT 1260807, 'Mike', 'Gibson' UNION ALL
SELECT 799573, 'Mike', 'Gibson' UNION ALL
SELECT 705127, 'Mike', 'Gibson' UNION ALL
SELECT 320142, 'Clark', 'Ward' UNION ALL
SELECT 4277648, 'Mike', 'Gibson' UNION ALL
SELECT 80656, 'Mike', 'Gibson' UNION ALL
SELECT 4352016, 'Mike', 'Gibson' UNION ALL
SELECT 781853, 'Mike', 'Gibson' UNION ALL
SELECT 1301032, 'Mike', 'Gibson' UNION ALL
SELECT 320142, 'Larry', 'Cool' UNION ALL
SELECT 320142, 'Barry', 'Brown' UNION ALL
SELECT 83508, 'Don', 'Baker' UNION ALL
SELECT 1000256242, 'Don', 'Baker' UNION ALL
SELECT 1000145408, 'Don', 'Baker'
I really do not care how the result is formatted, but maybe something like this:
1000256242 Don Baker 1
1000145408 Don Baker 2
4277648 Mike Gibson 1
80656 Mike Gibson 2
4352016 Mike Gibson 3
781853 Mike Gibson 4
1301032 Mike Gibson 5
The important thing that I need to see, are the MemberIDs and the Names
Thank you everyone in advance for all your help, suggestions and advice
Andrew SQLDBA

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286590 More actions · Answer 1

I can help with the duplicates thing, that is fairly simple. What I don't understand is your desired output.

1000256242 Don Baker 1
1000145408 Don Baker 2
4277648 Mike Gibson 1
80656 Mike Gibson 2
4352016 Mike Gibson 3
781853 Mike Gibson 4
1301032 Mike Gibson 5

Why are the names listed multiple times with various counts? For example Don Baker has 3 rows in the table but in your output you have 2 rows with different counts and you only listed 2 of the 3 MemberID values.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Sean Lange SSC Guru Points: 286590 More actions · Answer 2

Is this maybe what you are looking for?

select d.*, DupeCount

from @DuplicateMembers d

join

(

select FirstName + LastName as FullName, count(*) as DupeCount

from @DuplicateMembers

group by firstname + lastname

having count(*) > 1

) dupes on dupes.FullName = d.FirstName + d.LastName

order by LastName, FirstName

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

AndrewSQLDBA SSC-Insane Points: 21827 More actions · Answer 3

Hey Sean

That is perfect. Exactly what I needed to see.

Sorry about the confusion on the output. But this works just fine. Seeing the name and the MemberID were the important thing.

Thanks again, and have a great weekend.

Andrew SQLDBA

Sean Lange SSC Guru Points: 286590 More actions · Answer 4

No problem. Glad that worked you and thanks for letting me know. Have a good one.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

ScottPletcher SSC Guru Points: 101120 More actions · Answer 5

I like the technique overall, but the concatenation will likely hurt performance: maybe only slightly, maybe a lot, depending on indexing. Thus, I suggest GROUPing and JOINing w/o any concatenation:

SELECT

dm.*, dupes.DupesCount

FROM @DuplicateMembers dm

INNER JOIN (

SELECT LastName, FirstName, COUNT(*) AS DupesCount

FROM @DuplicateMembers

GROUP BY

LastName, FirstName

HAVING

COUNT(*) > 1

) AS dupes ON

dupes.LastName = dm.LastName AND

dupes.FirstName = dm.FirstName

ORDER BY

dm.LastName, dm.FirstName, dm.MemberID

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

ScottPletcher SSC Guru Points: 101120 More actions · Answer 6

If you can do w/o the direct DupesCount, you can also use an EXISTS check:

SELECT

dm.*

FROM @DuplicateMembers dm

WHERE EXISTS (

SELECT 1

FROM @DuplicateMembers dm2

WHERE

dm2.LastName = dm.LastName AND

dm2.FirstName = dm.FirstName AND

dm2.MemberID <> dm.MemberID

)

ORDER BY

dm.LastName, dm.FirstName, dm.MemberID

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Sean Lange SSC Guru Points: 286590 More actions · Answer 7

ScottPletcher (2/1/2013)
I like the technique overall, but the concatenation will likely hurt performance: maybe only slightly, maybe a lot, depending on indexing. Thus, I suggest GROUPing and JOINing w/o any concatenation:
SELECT
dm.*, dupes.DupesCount
FROM @DuplicateMembers dm
INNER JOIN (
SELECT LastName, FirstName, COUNT(*) AS DupesCount
FROM @DuplicateMembers
GROUP BY
LastName, FirstName
HAVING
COUNT(*) > 1
) AS dupes ON
dupes.LastName = dm.LastName AND
dupes.FirstName = dm.FirstName
ORDER BY
dm.LastName, dm.FirstName, dm.MemberID

Good point Scott. Also exists as you pointed out might be an even better approach.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

Interesting to compare the actual execution plans generated, especially when looking at them with SQLSentry Plan Explorer (Free Edition).

DECLARE @DuplicateMembers TABLE

(

MemberID int

, FirstName varchar(25)

, LastName varchar(25)

)

INSERT INTO @DuplicateMembers

SELECT 4663632, 'Jimmy', 'Smith' UNION ALL

SELECT 2669133, 'Jimmy', 'Smith' UNION ALL

SELECT 393382, 'Jimmy', 'Smith' UNION ALL

SELECT 2758596, 'Jimmy', 'Smith' UNION ALL

SELECT 168701, 'Jimmy', 'Smith' UNION ALL

SELECT 4394537, 'Jimmy', 'Smith' UNION ALL

SELECT 1349209, 'Sammy', 'Jones' UNION ALL

SELECT 2284743, 'Sammy', 'Jones' UNION ALL

SELECT 3907363, 'Sammy', 'Jones' UNION ALL

SELECT 7685349, 'Danny', 'Rowe' UNION ALL

SELECT 2770663, 'Danny', 'Rowe' UNION ALL

SELECT 320142, 'Danny', 'Rowe' UNION ALL

SELECT 320142, 'Gene', 'Miller' UNION ALL

SELECT 1000158919, 'Danny', 'Rowe' UNION ALL

SELECT 871838, 'Danny', 'Rowe' UNION ALL

SELECT 1000234650, 'Danny', 'Rowe' UNION ALL

SELECT 1260807, 'Mike', 'Gibson' UNION ALL

SELECT 799573, 'Mike', 'Gibson' UNION ALL

SELECT 705127, 'Mike', 'Gibson' UNION ALL

SELECT 320142, 'Clark', 'Ward' UNION ALL

SELECT 4277648, 'Mike', 'Gibson' UNION ALL

SELECT 80656, 'Mike', 'Gibson' UNION ALL

SELECT 4352016, 'Mike', 'Gibson' UNION ALL

SELECT 781853, 'Mike', 'Gibson' UNION ALL

SELECT 1301032, 'Mike', 'Gibson' UNION ALL

SELECT 320142, 'Larry', 'Cool' UNION ALL

SELECT 320142, 'Barry', 'Brown' UNION ALL

SELECT 83508, 'Don', 'Baker' UNION ALL

SELECT 1000256242, 'Don', 'Baker' UNION ALL

SELECT 1000145408, 'Don', 'Baker';

WITH DupMembers AS (

SELECT

MemberID,

FirstName,

LastName,

rn = ROW_NUMBER() OVER (PARTITION BY LastName, FirstName ORDER BY MemberID),

cnt = COUNT(*) OVER (PARTITION BY LastName, FirstName)

FROM

@DuplicateMembers

)

SELECT

LastName,

FirstName,

MemberID,

rn

FROM

DupMembers

WHERE

cnt > 1;

WITH DupMembers AS (

SELECT

MemberID,

FirstName,

LastName,

-- rn = ROW_NUMBER() OVER (PARTITION BY LastName, FirstName ORDER BY MemberID),

cnt = COUNT(*) OVER (PARTITION BY LastName, FirstName)

FROM

@DuplicateMembers

)

SELECT

LastName,

FirstName,

MemberID,

cnt

FROM

DupMembers

WHERE

cnt > 1;

select d.*, DupeCount

from @DuplicateMembers d

join

(

select FirstName + LastName as FullName, count(*) as DupeCount

from @DuplicateMembers

group by firstname + lastname

having count(*) > 1

) dupes on dupes.FullName = d.FirstName + d.LastName

order by LastName, FirstName;

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 9

30 rows on a table variable isn't enough for a performance test but I do like your second solution, Lynn.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

AndrewSQLDBA SSC-Insane Points: 21827 More actions · Answer 10

Thank You Everyone for all your work and the code.

Lynn, your code ran the best. Took only a few seconds over approx 200,000 rows. Very nice

Thank you all once again, and have a great weekend.

Andrew