Duplicate Rows when GUID is part of PK - How do you delete them?

  • For instance, the PK is defined as follows:

    KeyID = UniqueIdentifier (GUID)

    FirstName = char(n)

    LastName = char(n)

    and there are other fields which may or may not be unique, for instance:

    MoreInfo = char(n)

    The data looks as follows:

     {F73E32B9-D0DF-427E-A0EC-48EA51658A52} David      Russell    something

     {7DEEBD8A-86D0-47A2-9217-B98255AF57E2} David      Russell    whatever 

     {0C277B36-6423-4C2A-9459-C49097EC28A9} David      Russell    whatever 

     {5C9C9DE1-6283-4F7E-997E-F1886973FAF9} John       Smith      something

    I want to get rid of duplicate rows based on FirstName||LastName ONLY.  I do not care about any of the additional information.  But based on PK these rows are unique.

    We got the job done; but it was messy, and we use GUIDs all over the place, so this is bound to be needed again.  How about a three or four line solution?  Anybody?

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Firstly, KeyID looks to be always unique, thus could you not remove FirstName and LastName from the primary key and create a unique constraint on these two columns?

    As for automatically removing them my concerns about that would be if the guid was used elsewhere referencing the one in this table you wouldn't be able to remove the row. Also of the duplicates how do you decide which one is the correct row of the two?

    My solution to this problem however was to join the table to itself to find all the duplicates. Guids can be compared to each other in sql server for less than and greater than, by using this it is possible to exclude the highest/lowest guid from a set of results so that the server will not select one of the duplicates. However, the higher/lower seems to be quite arbitry when viewed against the string representation of the guid, but as long as you don't care which duplicate row is decided to be the unique row to be left it doesn't matter.

    My solution

    CREATE TABLE #People (
    KeyID uniqueidentifier,
    FirstName varchar(20),
    LastName varchar(20)
    )
    
    
    INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'David', 'Russell')
    INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'David', 'Russell')
    INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'David', 'Russell')
    INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'David', 'Russell')
    INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'John', 'Smith')
    INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'John', 'Smith')
    INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'John', 'Smith')
    INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'Joe', 'Doe')
    
    SELECT KeyID, FirstName, LastName FROM #People
    ORDER BY FirstName, LastName, KeyID
    
    --This is the select statement that does the finding of all but the 'first' duplicate
    SELECT DISTINCT b.KeyID, b.FirstName, b.LastName
    FROM #People AS a
    INNER JOIN #People AS b
    ON a.FirstName = b.FirstName
    AND a.LastName = b.LastName
    AND a.KeyID < b.KeyID
    ORDER BY FirstName, LastName, KeyID 
    --the order isn't needed for when used for deleting,
    --just makes the results order the same as the above select
    
    DELETE FROM #People
    WHERE KeyID IN 
    (
    SELECT DISTINCT b.KeyID
    FROM #People AS a
    INNER JOIN #People AS b
    ON a.FirstName = b.FirstName
    AND a.LastName = b.LastName
    AND a.KeyID < b.KeyID
    )
    
    SELECT KeyID, FirstName, LastName FROM #People
    ORDER BY FirstName, LastName, KeyID
    
    DROP TABLE #People
    
  • Good job!  thank you very much.  It will go in my script library for future use.  For anyone else who tries it, you will have to add the "b." to clarify the order by columns; but you get a very clear error where it is required.

    This is not a trivial problem.  Thanks again for your help.

    David

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply