December 8, 2011 at 9:55 am
I have been trying to delete the duplicate rows from a table. I want to keep only the rows that have a single entry per person, or the first row entered of a multiple record row per person.
Using the DataEntered column as the reference for the multiple Member rows, keep only the oldest row and all rows that have only a single Member entry
The new data can be placed into a Table Variable or a Temp Table. Does not matter.
IF OBJECT_ID('TempDB..#Member','U') IS NOT NULL
DROP TABLE #Member
--------------------------------------------------------
CREATE TABLE #Member
(
DateEntered DATETIME
, MemberControlID BIGINT
, MemberExternalID BIGINT
, MemberFullName VARCHAR(100)
)
--------------------------------------------------------
INSERT INTO #Member
(DateEntered, MemberControlID, MemberExternalID, MemberFullName)
SELECT '2011-12-07 12:48:43.000',20111207130252,101368889, 'Alex R Bennett' UNION ALL -- Keep This Row
SELECT '2011-12-07 10:30:20.000',20111207115324,101377900, 'Alan G Rivers' UNION ALL ------------------ DELETE THIS ROW --
SELECT '2011-12-07 10:29:50.000',20111207115232,101379911, 'Alan G Rivers' UNION ALL ------------------ DELETE THIS ROW --
SELECT '2011-12-07 10:17:00.000',20111207113463,101381234, 'Alan G Rivers' UNION ALL -- Keep This Row
SELECT '2011-12-07 08:28:10.000',20111207101327,101381237, 'Gregory L Adkinson' UNION ALL -- Keep This Row
SELECT '2011-12-07 08:28:45.000',20111207101392,713886750, 'Gregory L Adkinson' UNION ALL -------------- DELETE THIS ROW --
SELECT '2011-12-07 12:46:37.000',20111207130010,713558747, 'Roger E Saunders' UNION ALL -- Keep This Row
SELECT '2011-12-07 08:21:07.000',20111207100625,713878123, 'Alice A Rogers' UNION ALL -- Keep This Row
SELECT '2011-12-07 10:19:31.000',20111207113756,713880946, 'Mike L Carry' UNION ALL -- Keep This Row
SELECT '2011-12-07 11:19:56.000',20111207121532,713886667, 'Albert A Michaels' UNION ALL -- Keep This Row
SELECT '2011-12-07 12:43:21.000',20111207129702,713887835, 'Alfred B Jones' UNION ALL -- Keep This Row
SELECT '2011-12-07 12:18:53.000',20111207127338,713887858, 'Sam C White' -- Keep This Row
--select * from #Member ORDER BY DateEntered, MemberControlID, MemberFullName ASC
So that I would be left with only the following records that have a distinct MemberFullName, and a DateEntered as the oldest record
SELECT '2011-12-07 12:48:43.000',20111207130252,101368889, 'Alex R Bennett' UNION ALL
SELECT '2011-12-07 10:17:00.000',20111207113463,101381234, 'Alan G Rivers' UNION ALL
SELECT '2011-12-07 08:28:10.000',20111207101327,101381237, 'Gregory L Adkinson' UNION ALL
SELECT '2011-12-07 12:46:37.000',20111207130010,713558747, 'Roger E Saunders' UNION ALL
SELECT '2011-12-07 08:21:07.000',20111207100625,713878123, 'Alice A Rogers' UNION ALL
SELECT '2011-12-07 10:19:31.000',20111207113756,713880946, 'Mike L Carry' UNION ALL
SELECT '2011-12-07 11:19:56.000',20111207121532,713886667, 'Albert A Michaels' UNION ALL
SELECT '2011-12-07 12:43:21.000',20111207129702,713887835, 'Alfred B Jones' UNION ALL
SELECT '2011-12-07 12:18:53.000',20111207127338,713887858, 'Sam C White'
I am really close, but I cannot seem to get this correct. I greatly appreciate any and all help, comments, assistance with this.
Thank You in advance for all your help
Andrew SQLDBA
December 8, 2011 at 10:05 am
There are several methods you could follow. Is the MemberControlID unique? If it is, the follow SQL would pull out all the MemberControlID's that you want to keep:
Select MemberControlID
From #Member M
where DateEntered = (select min(DateEntered) from #Member M2
where MemberFullName = M.MemberFullName)
You can use the above SQL with an IN clause to pull your records:
Select * from #Member where MemberControlID
IN
(
Select MemberControlID
From #Member M
where DateEntered = (select min(DateEntered) from #Member M2
where MemberFullName = M.MemberFullName)
)
I think that is what you are looking for.
--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
December 8, 2011 at 11:38 am
December 8, 2011 at 11:42 am
ChazMan's approach works, bit using the nested IN subqueries results in scanning the #Member table three separate times. It can be done in a single scan using an approach like this:
SELECT b.*
FROM (
SELECT MemberFullName, MIN(DateEntered) AS DateEntered
FROM #Member
GROUP BY MemberFullName
) AS a INNER JOIN
#Member AS b ON a.MemberFullName = b.MemberFullName AND
a.DateEntered = b.DateEntered
Run against the provided sample data, with SET STATISTICS IO ON, this results in 1 scan and 1 logical read. This compares to 3 scans and 25 logical reads for the nested subquery approach. Doesn't really matter on a data set this small, but if you're going to end up running this regularly against a much larger table, it could make a significant difference. 🙂
December 8, 2011 at 11:46 am
I misspoke a bit...the subquery approach is even less performant than I first wrote, as the execution plan shows that it actually scans the #Member table 25 times. It has to do a full scan for each record in the table. That won't scale well at all. Unfortunately, salum's approach has the same issue. Both result in triangular joins, which perform very poorly with non-trivial record counts.
December 8, 2011 at 11:48 am
Excellent point Jon.
--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
December 8, 2011 at 11:56 am
Thank You Everyone
I was close, but just could not get it correct.
Greatly appreciate all the help
Andrew SQLDBA
December 8, 2011 at 12:01 pm
AndrewSQLDBA (12/8/2011)
Thank You EveryoneI was close, but just could not get it correct.
Greatly appreciate all the help
Andrew SQLDBA
You're welcome!
December 8, 2011 at 12:24 pm
You can very easily use the row_number() function to do this.
with cte as (
select row_number() over (partition by MemberFullName order by DateEntered) as nr,
*
from #Member
)
delete from cte where nr > 1
The query plan for this shows a single table scan plus a single sort, some more 0% actions and a table delete. If the table has an index on the MemberFullName the queryplan looks even better.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply