Duplicate Rows when GUID is part of PK - How do you delete them?

Question

Duplicate Rows when GUID is part of PK - How do you delete them?

debrucer1

SSCertifiable

Points: 5366
More actions
October 6, 2006 at 6:43 pm

#114484

For instance, the PK is defined as follows:
KeyID = UniqueIdentifier (GUID)
FirstName = char(n)
LastName = char(n)
and there are other fields which may or may not be unique, for instance:
MoreInfo = char(n)
The data looks as follows:
{F73E32B9-D0DF-427E-A0EC-48EA51658A52} David     Russell   something
{7DEEBD8A-86D0-47A2-9217-B98255AF57E2} David     Russell   whatever
{0C277B36-6423-4C2A-9459-C49097EC28A9} David     Russell   whatever
{5C9C9DE1-6283-4F7E-997E-F1886973FAF9} John      Smith     something
I want to get rid of duplicate rows based on FirstName||LastName ONLY. I do not care about any of the additional information. But based on PK these rows are unique.
We got the job done; but it was messy, and we use GUIDs all over the place, so this is bound to be needed again. How about a three or four line solution? Anybody?
Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Chris Chilvers SSCrazy Points: 2260 More actions · Answer 1

Firstly, KeyID looks to be always unique, thus could you not remove FirstName and LastName from the primary key and create a unique constraint on these two columns?

As for automatically removing them my concerns about that would be if the guid was used elsewhere referencing the one in this table you wouldn't be able to remove the row. Also of the duplicates how do you decide which one is the correct row of the two?

My solution to this problem however was to join the table to itself to find all the duplicates. Guids can be compared to each other in sql server for less than and greater than, by using this it is possible to exclude the highest/lowest guid from a set of results so that the server will not select one of the duplicates. However, the higher/lower seems to be quite arbitry when viewed against the string representation of the guid, but as long as you don't care which duplicate row is decided to be the unique row to be left it doesn't matter.

My solution

CREATE TABLE #People (
KeyID uniqueidentifier,
FirstName varchar(20),
LastName varchar(20)
)


INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'David', 'Russell')
INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'David', 'Russell')
INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'David', 'Russell')
INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'David', 'Russell')
INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'John', 'Smith')
INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'John', 'Smith')
INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'John', 'Smith')
INSERT INTO #People (KeyID, FirstName, LastName) VALUES (newid(), 'Joe', 'Doe')

SELECT KeyID, FirstName, LastName FROM #People
ORDER BY FirstName, LastName, KeyID

--This is the select statement that does the finding of all but the 'first' duplicate
SELECT DISTINCT b.KeyID, b.FirstName, b.LastName
FROM #People AS a
INNER JOIN #People AS b
ON a.FirstName = b.FirstName
AND a.LastName = b.LastName
AND a.KeyID < b.KeyID
ORDER BY FirstName, LastName, KeyID 
--the order isn't needed for when used for deleting,
--just makes the results order the same as the above select

DELETE FROM #People
WHERE KeyID IN 
(
SELECT DISTINCT b.KeyID
FROM #People AS a
INNER JOIN #People AS b
ON a.FirstName = b.FirstName
AND a.LastName = b.LastName
AND a.KeyID < b.KeyID
)

SELECT KeyID, FirstName, LastName FROM #People
ORDER BY FirstName, LastName, KeyID

DROP TABLE #People

debrucer1 SSCertifiable Points: 5366 More actions · Answer 2

Good job! thank you very much. It will go in my script library for future use. For anyone else who tries it, you will have to add the "b." to clarify the order by columns; but you get a very clear error where it is required.

This is not a trivial problem. Thanks again for your help.

David

Thank-you,
David Russell
Any Cloud, Any Database, Oracle since 1982