February 1, 2013 at 12:27 pm
Hello Everyone
I am working on way to show the Duplicates along with the Number Of Duplicates.
I have a table with Member Data, there are no, one and many duplicates. But the Member ID is not the Duplicate, it is the Member Name. Lets say to giggles, the name belongs to only one single person.
I would like to list each Member Name, along with each distinct MemberID that is duplicated in the table. I am not concerned about the Members with only one row.
I have tried using a CTE with the ROW_NUMBER(), and tried using the HAVING clause with a count, but I am missing something, and I think that it is really something simple. I am getting a list of Member names, a single MemberID and the Count if the Member has more than one row.
DECLARE @DuplicateMembers TABLE
(
MemberID int
, FirstName varchar(25)
, LastName varchar(25)
)
INSERT INTO @DuplicateMembers
SELECT 4663632, 'Jimmy', 'Smith' UNION ALL
SELECT 2669133, 'Jimmy', 'Smith' UNION ALL
SELECT 393382, 'Jimmy', 'Smith' UNION ALL
SELECT 2758596, 'Jimmy', 'Smith' UNION ALL
SELECT 168701, 'Jimmy', 'Smith' UNION ALL
SELECT 4394537, 'Jimmy', 'Smith' UNION ALL
SELECT 1349209, 'Sammy', 'Jones' UNION ALL
SELECT 2284743, 'Sammy', 'Jones' UNION ALL
SELECT 3907363, 'Sammy', 'Jones' UNION ALL
SELECT 7685349, 'Danny', 'Rowe' UNION ALL
SELECT 2770663, 'Danny', 'Rowe' UNION ALL
SELECT 320142, 'Danny', 'Rowe' UNION ALL
SELECT 320142, 'Gene', 'Miller' UNION ALL
SELECT 1000158919, 'Danny', 'Rowe' UNION ALL
SELECT 871838, 'Danny', 'Rowe' UNION ALL
SELECT 1000234650, 'Danny', 'Rowe' UNION ALL
SELECT 1260807, 'Mike', 'Gibson' UNION ALL
SELECT 799573, 'Mike', 'Gibson' UNION ALL
SELECT 705127, 'Mike', 'Gibson' UNION ALL
SELECT 320142, 'Clark', 'Ward' UNION ALL
SELECT 4277648, 'Mike', 'Gibson' UNION ALL
SELECT 80656, 'Mike', 'Gibson' UNION ALL
SELECT 4352016, 'Mike', 'Gibson' UNION ALL
SELECT 781853, 'Mike', 'Gibson' UNION ALL
SELECT 1301032, 'Mike', 'Gibson' UNION ALL
SELECT 320142, 'Larry', 'Cool' UNION ALL
SELECT 320142, 'Barry', 'Brown' UNION ALL
SELECT 83508, 'Don', 'Baker' UNION ALL
SELECT 1000256242, 'Don', 'Baker' UNION ALL
SELECT 1000145408, 'Don', 'Baker'
I really do not care how the result is formatted, but maybe something like this:
1000256242 Don Baker 1
1000145408 Don Baker 2
4277648 Mike Gibson 1
80656 Mike Gibson 2
4352016 Mike Gibson 3
781853 Mike Gibson 4
1301032 Mike Gibson 5
The important thing that I need to see, are the MemberIDs and the Names
Thank you everyone in advance for all your help, suggestions and advice
Andrew SQLDBA
February 1, 2013 at 12:42 pm
I can help with the duplicates thing, that is fairly simple. What I don't understand is your desired output.
1000256242 Don Baker 1
1000145408 Don Baker 2
4277648 Mike Gibson 1
80656 Mike Gibson 2
4352016 Mike Gibson 3
781853 Mike Gibson 4
1301032 Mike Gibson 5
Why are the names listed multiple times with various counts? For example Don Baker has 3 rows in the table but in your output you have 2 rows with different counts and you only listed 2 of the 3 MemberID values.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 1, 2013 at 12:49 pm
Is this maybe what you are looking for?
select d.*, DupeCount
from @DuplicateMembers d
join
(
select FirstName + LastName as FullName, count(*) as DupeCount
from @DuplicateMembers
group by firstname + lastname
having count(*) > 1
) dupes on dupes.FullName = d.FirstName + d.LastName
order by LastName, FirstName
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 1, 2013 at 12:53 pm
Hey Sean
That is perfect. Exactly what I needed to see.
Sorry about the confusion on the output. But this works just fine. Seeing the name and the MemberID were the important thing.
Thanks again, and have a great weekend.
Andrew SQLDBA
February 1, 2013 at 12:55 pm
No problem. Glad that worked you and thanks for letting me know. Have a good one.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 1, 2013 at 1:03 pm
I like the technique overall, but the concatenation will likely hurt performance: maybe only slightly, maybe a lot, depending on indexing. Thus, I suggest GROUPing and JOINing w/o any concatenation:
SELECT
dm.*, dupes.DupesCount
FROM @DuplicateMembers dm
INNER JOIN (
SELECT LastName, FirstName, COUNT(*) AS DupesCount
FROM @DuplicateMembers
GROUP BY
LastName, FirstName
HAVING
COUNT(*) > 1
) AS dupes ON
dupes.LastName = dm.LastName AND
dupes.FirstName = dm.FirstName
ORDER BY
dm.LastName, dm.FirstName, dm.MemberID
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 1, 2013 at 1:08 pm
If you can do w/o the direct DupesCount, you can also use an EXISTS check:
SELECT
dm.*
FROM @DuplicateMembers dm
WHERE EXISTS (
SELECT 1
FROM @DuplicateMembers dm2
WHERE
dm2.LastName = dm.LastName AND
dm2.FirstName = dm.FirstName AND
dm2.MemberID <> dm.MemberID
)
ORDER BY
dm.LastName, dm.FirstName, dm.MemberID
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 1, 2013 at 1:15 pm
ScottPletcher (2/1/2013)
I like the technique overall, but the concatenation will likely hurt performance: maybe only slightly, maybe a lot, depending on indexing. Thus, I suggest GROUPing and JOINing w/o any concatenation:
SELECT
dm.*, dupes.DupesCount
FROM @DuplicateMembers dm
INNER JOIN (
SELECT LastName, FirstName, COUNT(*) AS DupesCount
FROM @DuplicateMembers
GROUP BY
LastName, FirstName
HAVING
COUNT(*) > 1
) AS dupes ON
dupes.LastName = dm.LastName AND
dupes.FirstName = dm.FirstName
ORDER BY
dm.LastName, dm.FirstName, dm.MemberID
Good point Scott. Also exists as you pointed out might be an even better approach.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 1, 2013 at 1:29 pm
Interesting to compare the actual execution plans generated, especially when looking at them with SQLSentry Plan Explorer (Free Edition).
DECLARE @DuplicateMembers TABLE
(
MemberID int
, FirstName varchar(25)
, LastName varchar(25)
)
INSERT INTO @DuplicateMembers
SELECT 4663632, 'Jimmy', 'Smith' UNION ALL
SELECT 2669133, 'Jimmy', 'Smith' UNION ALL
SELECT 393382, 'Jimmy', 'Smith' UNION ALL
SELECT 2758596, 'Jimmy', 'Smith' UNION ALL
SELECT 168701, 'Jimmy', 'Smith' UNION ALL
SELECT 4394537, 'Jimmy', 'Smith' UNION ALL
SELECT 1349209, 'Sammy', 'Jones' UNION ALL
SELECT 2284743, 'Sammy', 'Jones' UNION ALL
SELECT 3907363, 'Sammy', 'Jones' UNION ALL
SELECT 7685349, 'Danny', 'Rowe' UNION ALL
SELECT 2770663, 'Danny', 'Rowe' UNION ALL
SELECT 320142, 'Danny', 'Rowe' UNION ALL
SELECT 320142, 'Gene', 'Miller' UNION ALL
SELECT 1000158919, 'Danny', 'Rowe' UNION ALL
SELECT 871838, 'Danny', 'Rowe' UNION ALL
SELECT 1000234650, 'Danny', 'Rowe' UNION ALL
SELECT 1260807, 'Mike', 'Gibson' UNION ALL
SELECT 799573, 'Mike', 'Gibson' UNION ALL
SELECT 705127, 'Mike', 'Gibson' UNION ALL
SELECT 320142, 'Clark', 'Ward' UNION ALL
SELECT 4277648, 'Mike', 'Gibson' UNION ALL
SELECT 80656, 'Mike', 'Gibson' UNION ALL
SELECT 4352016, 'Mike', 'Gibson' UNION ALL
SELECT 781853, 'Mike', 'Gibson' UNION ALL
SELECT 1301032, 'Mike', 'Gibson' UNION ALL
SELECT 320142, 'Larry', 'Cool' UNION ALL
SELECT 320142, 'Barry', 'Brown' UNION ALL
SELECT 83508, 'Don', 'Baker' UNION ALL
SELECT 1000256242, 'Don', 'Baker' UNION ALL
SELECT 1000145408, 'Don', 'Baker';
WITH DupMembers AS (
SELECT
MemberID,
FirstName,
LastName,
rn = ROW_NUMBER() OVER (PARTITION BY LastName, FirstName ORDER BY MemberID),
cnt = COUNT(*) OVER (PARTITION BY LastName, FirstName)
FROM
@DuplicateMembers
)
SELECT
LastName,
FirstName,
MemberID,
rn
FROM
DupMembers
WHERE
cnt > 1;
WITH DupMembers AS (
SELECT
MemberID,
FirstName,
LastName,
-- rn = ROW_NUMBER() OVER (PARTITION BY LastName, FirstName ORDER BY MemberID),
cnt = COUNT(*) OVER (PARTITION BY LastName, FirstName)
FROM
@DuplicateMembers
)
SELECT
LastName,
FirstName,
MemberID,
cnt
FROM
DupMembers
WHERE
cnt > 1;
select d.*, DupeCount
from @DuplicateMembers d
join
(
select FirstName + LastName as FullName, count(*) as DupeCount
from @DuplicateMembers
group by firstname + lastname
having count(*) > 1
) dupes on dupes.FullName = d.FirstName + d.LastName
order by LastName, FirstName;
February 1, 2013 at 6:02 pm
30 rows on a table variable isn't enough for a performance test but I do like your second solution, Lynn.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 1, 2013 at 6:23 pm
Thank You Everyone for all your work and the code.
Lynn, your code ran the best. Took only a few seconds over approx 200,000 rows. Very nice
Thank you all once again, and have a great weekend.
Andrew
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply