Remove Some Duplicate Rows with the Oldest InsertDate

Question

Remove Some Duplicate Rows with the Oldest InsertDate

AndrewSQLDBA

SSC-Insane

Points: 21827
More actions
December 8, 2011 at 9:55 am

#264435

I have been trying to delete the duplicate rows from a table. I want to keep only the rows that have a single entry per person, or the first row entered of a multiple record row per person.
Using the DataEntered column as the reference for the multiple Member rows, keep only the oldest row and all rows that have only a single Member entry
The new data can be placed into a Table Variable or a Temp Table. Does not matter.
IF OBJECT_ID('TempDB..#Member','U') IS NOT NULL
DROP TABLE #Member
--------------------------------------------------------
CREATE TABLE #Member
(
DateEntered DATETIME
, MemberControlID BIGINT
, MemberExternalID BIGINT
, MemberFullName VARCHAR(100)
)
--------------------------------------------------------
INSERT INTO #Member
(DateEntered, MemberControlID, MemberExternalID, MemberFullName)
SELECT '2011-12-07 12:48:43.000',20111207130252,101368889, 'Alex R Bennett' UNION ALL -- Keep This Row
SELECT '2011-12-07 10:30:20.000',20111207115324,101377900, 'Alan G Rivers' UNION ALL ------------------ DELETE THIS ROW --
SELECT '2011-12-07 10:29:50.000',20111207115232,101379911, 'Alan G Rivers' UNION ALL ------------------ DELETE THIS ROW --
SELECT '2011-12-07 10:17:00.000',20111207113463,101381234, 'Alan G Rivers' UNION ALL -- Keep This Row
SELECT '2011-12-07 08:28:10.000',20111207101327,101381237, 'Gregory L Adkinson' UNION ALL -- Keep This Row
SELECT '2011-12-07 08:28:45.000',20111207101392,713886750, 'Gregory L Adkinson' UNION ALL -------------- DELETE THIS ROW --
SELECT '2011-12-07 12:46:37.000',20111207130010,713558747, 'Roger E Saunders' UNION ALL -- Keep This Row
SELECT '2011-12-07 08:21:07.000',20111207100625,713878123, 'Alice A Rogers' UNION ALL -- Keep This Row
SELECT '2011-12-07 10:19:31.000',20111207113756,713880946, 'Mike L Carry' UNION ALL -- Keep This Row
SELECT '2011-12-07 11:19:56.000',20111207121532,713886667, 'Albert A Michaels' UNION ALL -- Keep This Row
SELECT '2011-12-07 12:43:21.000',20111207129702,713887835, 'Alfred B Jones' UNION ALL -- Keep This Row
SELECT '2011-12-07 12:18:53.000',20111207127338,713887858, 'Sam C White' -- Keep This Row
--select * from #Member ORDER BY DateEntered, MemberControlID, MemberFullName ASC
So that I would be left with only the following records that have a distinct MemberFullName, and a DateEntered as the oldest record
SELECT '2011-12-07 12:48:43.000',20111207130252,101368889, 'Alex R Bennett' UNION ALL
SELECT '2011-12-07 10:17:00.000',20111207113463,101381234, 'Alan G Rivers' UNION ALL
SELECT '2011-12-07 08:28:10.000',20111207101327,101381237, 'Gregory L Adkinson' UNION ALL
SELECT '2011-12-07 12:46:37.000',20111207130010,713558747, 'Roger E Saunders' UNION ALL
SELECT '2011-12-07 08:21:07.000',20111207100625,713878123, 'Alice A Rogers' UNION ALL
SELECT '2011-12-07 10:19:31.000',20111207113756,713880946, 'Mike L Carry' UNION ALL
SELECT '2011-12-07 11:19:56.000',20111207121532,713886667, 'Albert A Michaels' UNION ALL
SELECT '2011-12-07 12:43:21.000',20111207129702,713887835, 'Alfred B Jones' UNION ALL
SELECT '2011-12-07 12:18:53.000',20111207127338,713887858, 'Sam C White'
I am really close, but I cannot seem to get this correct. I greatly appreciate any and all help, comments, assistance with this.
Thank You in advance for all your help
Andrew SQLDBA

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

ChazMan SSCrazy Points: 2259 More actions · Answer 1

There are several methods you could follow. Is the MemberControlID unique? If it is, the follow SQL would pull out all the MemberControlID's that you want to keep:

Select MemberControlID

From #Member M

where DateEntered = (select min(DateEntered) from #Member M2

where MemberFullName = M.MemberFullName)

You can use the above SQL with an IN clause to pull your records:

Select * from #Member where MemberControlID

IN

(

Select MemberControlID

From #Member M

where DateEntered = (select min(DateEntered) from #Member M2

where MemberFullName = M.MemberFullName)

)

I think that is what you are looking for.

--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.

salum SSCommitted Points: 1598 More actions · Answer 2

this will also give result that you want

Select * from #Member except

Select a.* from #Member a inner join #Member b

on a.MemberFullName=b.MemberFullName where a.DateEntered>b.DateEntered

java[/url]

JonFox SSCrazy Points: 2157 More actions · Answer 3

ChazMan's approach works, bit using the nested IN subqueries results in scanning the #Member table three separate times. It can be done in a single scan using an approach like this:

SELECT b.*

FROM (

SELECT MemberFullName, MIN(DateEntered) AS DateEntered

FROM #Member

GROUP BY MemberFullName

) AS a INNER JOIN

#Member AS b ON a.MemberFullName = b.MemberFullName AND

a.DateEntered = b.DateEntered

Run against the provided sample data, with SET STATISTICS IO ON, this results in 1 scan and 1 logical read. This compares to 3 scans and 25 logical reads for the nested subquery approach. Doesn't really matter on a data set this small, but if you're going to end up running this regularly against a much larger table, it could make a significant difference. 🙂

JonFox SSCrazy Points: 2157 More actions · Answer 4

I misspoke a bit...the subquery approach is even less performant than I first wrote, as the execution plan shows that it actually scans the #Member table 25 times. It has to do a full scan for each record in the table. That won't scale well at all. Unfortunately, salum's approach has the same issue. Both result in triangular joins, which perform very poorly with non-trivial record counts.

ChazMan SSCrazy Points: 2259 More actions · Answer 5

Excellent point Jon.

--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.

AndrewSQLDBA SSC-Insane Points: 21827 More actions · Answer 6

Thank You Everyone

I was close, but just could not get it correct.

Greatly appreciate all the help

Andrew SQLDBA

JonFox SSCrazy Points: 2157 More actions · Answer 7

AndrewSQLDBA (12/8/2011)
Thank You Everyone
I was close, but just could not get it correct.
Greatly appreciate all the help
Andrew SQLDBA

You're welcome!

R.P.Rozema SSChampion Points: 12317 More actions · Answer 8

You can very easily use the row_number() function to do this.

with cte as (

select row_number() over (partition by MemberFullName order by DateEntered) as nr,

*

from #Member

)

delete from cte where nr > 1

The query plan for this shows a single table scan plus a single sort, some more 0% actions and a table delete. If the table has an index on the MemberFullName the queryplan looks even better.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?