Duplicate Names, but Not Duplicate IDs

  • Hello Everyone

    I am working on way to show the Duplicates along with the Number Of Duplicates.

    I have a table with Member Data, there are no, one and many duplicates. But the Member ID is not the Duplicate, it is the Member Name. Lets say to giggles, the name belongs to only one single person.

    I would like to list each Member Name, along with each distinct MemberID that is duplicated in the table. I am not concerned about the Members with only one row.

    I have tried using a CTE with the ROW_NUMBER(), and tried using the HAVING clause with a count, but I am missing something, and I think that it is really something simple. I am getting a list of Member names, a single MemberID and the Count if the Member has more than one row.

    DECLARE @DuplicateMembers TABLE

    (

    MemberID int

    , FirstName varchar(25)

    , LastName varchar(25)

    )

    INSERT INTO @DuplicateMembers

    SELECT 4663632, 'Jimmy', 'Smith' UNION ALL

    SELECT 2669133, 'Jimmy', 'Smith' UNION ALL

    SELECT 393382, 'Jimmy', 'Smith' UNION ALL

    SELECT 2758596, 'Jimmy', 'Smith' UNION ALL

    SELECT 168701, 'Jimmy', 'Smith' UNION ALL

    SELECT 4394537, 'Jimmy', 'Smith' UNION ALL

    SELECT 1349209, 'Sammy', 'Jones' UNION ALL

    SELECT 2284743, 'Sammy', 'Jones' UNION ALL

    SELECT 3907363, 'Sammy', 'Jones' UNION ALL

    SELECT 7685349, 'Danny', 'Rowe' UNION ALL

    SELECT 2770663, 'Danny', 'Rowe' UNION ALL

    SELECT 320142, 'Danny', 'Rowe' UNION ALL

    SELECT 320142, 'Gene', 'Miller' UNION ALL

    SELECT 1000158919, 'Danny', 'Rowe' UNION ALL

    SELECT 871838, 'Danny', 'Rowe' UNION ALL

    SELECT 1000234650, 'Danny', 'Rowe' UNION ALL

    SELECT 1260807, 'Mike', 'Gibson' UNION ALL

    SELECT 799573, 'Mike', 'Gibson' UNION ALL

    SELECT 705127, 'Mike', 'Gibson' UNION ALL

    SELECT 320142, 'Clark', 'Ward' UNION ALL

    SELECT 4277648, 'Mike', 'Gibson' UNION ALL

    SELECT 80656, 'Mike', 'Gibson' UNION ALL

    SELECT 4352016, 'Mike', 'Gibson' UNION ALL

    SELECT 781853, 'Mike', 'Gibson' UNION ALL

    SELECT 1301032, 'Mike', 'Gibson' UNION ALL

    SELECT 320142, 'Larry', 'Cool' UNION ALL

    SELECT 320142, 'Barry', 'Brown' UNION ALL

    SELECT 83508, 'Don', 'Baker' UNION ALL

    SELECT 1000256242, 'Don', 'Baker' UNION ALL

    SELECT 1000145408, 'Don', 'Baker'

    I really do not care how the result is formatted, but maybe something like this:

    1000256242 Don Baker 1

    1000145408 Don Baker 2

    4277648 Mike Gibson 1

    80656 Mike Gibson 2

    4352016 Mike Gibson 3

    781853 Mike Gibson 4

    1301032 Mike Gibson 5

    The important thing that I need to see, are the MemberIDs and the Names

    Thank you everyone in advance for all your help, suggestions and advice

    Andrew SQLDBA

  • I can help with the duplicates thing, that is fairly simple. What I don't understand is your desired output.

    1000256242 Don Baker 1

    1000145408 Don Baker 2

    4277648 Mike Gibson 1

    80656 Mike Gibson 2

    4352016 Mike Gibson 3

    781853 Mike Gibson 4

    1301032 Mike Gibson 5

    Why are the names listed multiple times with various counts? For example Don Baker has 3 rows in the table but in your output you have 2 rows with different counts and you only listed 2 of the 3 MemberID values.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Is this maybe what you are looking for?

    select d.*, DupeCount

    from @DuplicateMembers d

    join

    (

    select FirstName + LastName as FullName, count(*) as DupeCount

    from @DuplicateMembers

    group by firstname + lastname

    having count(*) > 1

    ) dupes on dupes.FullName = d.FirstName + d.LastName

    order by LastName, FirstName

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Hey Sean

    That is perfect. Exactly what I needed to see.

    Sorry about the confusion on the output. But this works just fine. Seeing the name and the MemberID were the important thing.

    Thanks again, and have a great weekend.

    Andrew SQLDBA

  • No problem. Glad that worked you and thanks for letting me know. Have a good one.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I like the technique overall, but the concatenation will likely hurt performance: maybe only slightly, maybe a lot, depending on indexing. Thus, I suggest GROUPing and JOINing w/o any concatenation:

    SELECT

    dm.*, dupes.DupesCount

    FROM @DuplicateMembers dm

    INNER JOIN (

    SELECT LastName, FirstName, COUNT(*) AS DupesCount

    FROM @DuplicateMembers

    GROUP BY

    LastName, FirstName

    HAVING

    COUNT(*) > 1

    ) AS dupes ON

    dupes.LastName = dm.LastName AND

    dupes.FirstName = dm.FirstName

    ORDER BY

    dm.LastName, dm.FirstName, dm.MemberID

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • If you can do w/o the direct DupesCount, you can also use an EXISTS check:

    SELECT

    dm.*

    FROM @DuplicateMembers dm

    WHERE EXISTS (

    SELECT 1

    FROM @DuplicateMembers dm2

    WHERE

    dm2.LastName = dm.LastName AND

    dm2.FirstName = dm.FirstName AND

    dm2.MemberID <> dm.MemberID

    )

    ORDER BY

    dm.LastName, dm.FirstName, dm.MemberID

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (2/1/2013)


    I like the technique overall, but the concatenation will likely hurt performance: maybe only slightly, maybe a lot, depending on indexing. Thus, I suggest GROUPing and JOINing w/o any concatenation:

    SELECT

    dm.*, dupes.DupesCount

    FROM @DuplicateMembers dm

    INNER JOIN (

    SELECT LastName, FirstName, COUNT(*) AS DupesCount

    FROM @DuplicateMembers

    GROUP BY

    LastName, FirstName

    HAVING

    COUNT(*) > 1

    ) AS dupes ON

    dupes.LastName = dm.LastName AND

    dupes.FirstName = dm.FirstName

    ORDER BY

    dm.LastName, dm.FirstName, dm.MemberID

    Good point Scott. Also exists as you pointed out might be an even better approach.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Interesting to compare the actual execution plans generated, especially when looking at them with SQLSentry Plan Explorer (Free Edition).

    DECLARE @DuplicateMembers TABLE

    (

    MemberID int

    , FirstName varchar(25)

    , LastName varchar(25)

    )

    INSERT INTO @DuplicateMembers

    SELECT 4663632, 'Jimmy', 'Smith' UNION ALL

    SELECT 2669133, 'Jimmy', 'Smith' UNION ALL

    SELECT 393382, 'Jimmy', 'Smith' UNION ALL

    SELECT 2758596, 'Jimmy', 'Smith' UNION ALL

    SELECT 168701, 'Jimmy', 'Smith' UNION ALL

    SELECT 4394537, 'Jimmy', 'Smith' UNION ALL

    SELECT 1349209, 'Sammy', 'Jones' UNION ALL

    SELECT 2284743, 'Sammy', 'Jones' UNION ALL

    SELECT 3907363, 'Sammy', 'Jones' UNION ALL

    SELECT 7685349, 'Danny', 'Rowe' UNION ALL

    SELECT 2770663, 'Danny', 'Rowe' UNION ALL

    SELECT 320142, 'Danny', 'Rowe' UNION ALL

    SELECT 320142, 'Gene', 'Miller' UNION ALL

    SELECT 1000158919, 'Danny', 'Rowe' UNION ALL

    SELECT 871838, 'Danny', 'Rowe' UNION ALL

    SELECT 1000234650, 'Danny', 'Rowe' UNION ALL

    SELECT 1260807, 'Mike', 'Gibson' UNION ALL

    SELECT 799573, 'Mike', 'Gibson' UNION ALL

    SELECT 705127, 'Mike', 'Gibson' UNION ALL

    SELECT 320142, 'Clark', 'Ward' UNION ALL

    SELECT 4277648, 'Mike', 'Gibson' UNION ALL

    SELECT 80656, 'Mike', 'Gibson' UNION ALL

    SELECT 4352016, 'Mike', 'Gibson' UNION ALL

    SELECT 781853, 'Mike', 'Gibson' UNION ALL

    SELECT 1301032, 'Mike', 'Gibson' UNION ALL

    SELECT 320142, 'Larry', 'Cool' UNION ALL

    SELECT 320142, 'Barry', 'Brown' UNION ALL

    SELECT 83508, 'Don', 'Baker' UNION ALL

    SELECT 1000256242, 'Don', 'Baker' UNION ALL

    SELECT 1000145408, 'Don', 'Baker';

    WITH DupMembers AS (

    SELECT

    MemberID,

    FirstName,

    LastName,

    rn = ROW_NUMBER() OVER (PARTITION BY LastName, FirstName ORDER BY MemberID),

    cnt = COUNT(*) OVER (PARTITION BY LastName, FirstName)

    FROM

    @DuplicateMembers

    )

    SELECT

    LastName,

    FirstName,

    MemberID,

    rn

    FROM

    DupMembers

    WHERE

    cnt > 1;

    WITH DupMembers AS (

    SELECT

    MemberID,

    FirstName,

    LastName,

    -- rn = ROW_NUMBER() OVER (PARTITION BY LastName, FirstName ORDER BY MemberID),

    cnt = COUNT(*) OVER (PARTITION BY LastName, FirstName)

    FROM

    @DuplicateMembers

    )

    SELECT

    LastName,

    FirstName,

    MemberID,

    cnt

    FROM

    DupMembers

    WHERE

    cnt > 1;

    select d.*, DupeCount

    from @DuplicateMembers d

    join

    (

    select FirstName + LastName as FullName, count(*) as DupeCount

    from @DuplicateMembers

    group by firstname + lastname

    having count(*) > 1

    ) dupes on dupes.FullName = d.FirstName + d.LastName

    order by LastName, FirstName;

  • 30 rows on a table variable isn't enough for a performance test but I do like your second solution, Lynn.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank You Everyone for all your work and the code.

    Lynn, your code ran the best. Took only a few seconds over approx 200,000 rows. Very nice

    Thank you all once again, and have a great weekend.

    Andrew

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply