The following was sent to me by my friend and colleague Dave Miller:
Dave's Email:
Wanted to pass along something I hadn't used before and found useful to easily get rid of duplicates in a set of data. The functionality has existed in the SQL language and was supported in SQL Server 2005. This uses Common Table Expressions (CTE) and the ROW_NUMBER() function.
The PARTITION BY portion of the statment specifies when to reset the row number, in my example I had:
PersonId | Type | PNumber | |
A | 1 | 123 | |
A | 1 | 345 | |
B | 1 | 123 | ***DELETE |
A | 2 | 123 | |
B | 1 | 123 | |
B | 1 | 123 | ***DELETE |
A | 2 | 123 | ***DELETE |
B | 1 | 1234 |
The following query will remove the duplicates:
WITH PersonPhones AS (
SELECT ROW_NUMBER() OVER (PARTITION BY PersonId, PhonetypeId, PhoneNumber ORDER BY PersonId, PhonetypeId, PhoneNumber) AS GroupIndex,
PersonId,
PhonetypeId,
PhoneNumber
FROM Person.Phone pp
)
DELETE FROM PersonPhones WHERE GroupIndex > 1
The CTE would create a table with the following data, where anything that is a duplicate has a GroupIndexId > 1
GroupIndex | PersonId | Type | PNumber | |
1 | A | 1 | 123 | |
1 | A | 1 | 345 | |
1 | A | 2 | 123 | |
2 | A | 2 | 123 | ***DELETE |
1 | B | 1 | 123 | |
2 | B | 1 | 123 | ***DELETE |
3 | B | 1 | 123 | ***DELETE |
1 | B | 1 | 1234 |
This method could be used to get the nth item of a group also.