October 23, 2013 at 6:10 am
Hi,
I have the below query that i need to remove some duplicates, but not quite as simple as it sounds.
What i would like to do is something like:
If [Read_code7],[Read_code] and[Staff_number] are the same then only include one row. The issue i have is that the caseload_holder and [Event_done_at] fields might have different values in them so i cant use a simple distinct.
Hopefully i have explained that well enough!
SELECT
[Patient_Count],
[Read_code7]
,[Read_code]
,[Current_Caseload_Holder]
,[Staff_number]
,[Event_date]
,[Event_done_at]
FROM table
October 23, 2013 at 6:15 am
Use ROW_NUMBER, have a look here http://www.sqlservercentral.com/articles/duplicate+data/102383/
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537October 23, 2013 at 7:50 am
An example that may work for you (as mark suggested in the link provided);WITH Dups AS (
SELECT
[Patient_Count]
,[Read_code7]
,[Read_code]
,[Current_Caseload_Holder]
,[Staff_number]
,[Event_date]
,[Event_done_at],
ROW_NUMBER() OVER (PARTITION BY [Read_code7],[Read_code],[Staff_number] ORDER BY [Read_code7]) AS Cnt
FROM MyTable
) SELECT * FROM Dups WHERE Cnt = 1
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
October 23, 2013 at 8:12 am
Many Thanks to both for your suggestions. I have managed to make it work!
My only question: is there anyway i can include which row it chooses? For example in some of the duplicated rows the[Current_Caseload_Holder] field is populated and then null. I would like to keep the populated field if possible?
October 23, 2013 at 8:15 am
Yes, you just need to change the ORDER BY clause
;WITH Dups AS (
SELECT
[Patient_Count]
,[Read_code7]
,[Read_code]
,[Current_Caseload_Holder]
,[Staff_number]
,[Event_date]
,[Event_done_at],
ROW_NUMBER() OVER (PARTITION BY [Read_code7],[Read_code],[Staff_number] ORDER BY CASE WHEN [Current_Caseload_Holder] IS NOT NULL THEN 0 ELSE 1 END, [Read_code7]) AS Cnt
FROM MyTable
)
DELETE FROM Dups WHERE Cnt > 1
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537October 23, 2013 at 8:15 am
Just add a condition Current_Caseload_Holder IS NOT NULL
Or you can just add to your CTE and fetch records for the ones that are/are not null, using a UNION ALL
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
October 23, 2013 at 8:17 am
MUCH SIMPLER 🙂
Mark-101232 (10/23/2013)
Yes, you just need to change the ORDER BY clause
;WITH Dups AS (
SELECT
[Patient_Count]
,[Read_code7]
,[Read_code]
,[Current_Caseload_Holder]
,[Staff_number]
,[Event_date]
,[Event_done_at],
ROW_NUMBER() OVER (PARTITION BY [Read_code7],[Read_code],[Staff_number] ORDER BY CASE WHEN [Current_Caseload_Holder] IS NOT NULL THEN 0 ELSE 1 END, [Read_code7]) AS Cnt
FROM MyTable
)
DELETE FROM Dups WHERE Cnt > 1
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
October 23, 2013 at 8:25 am
Thanks guys.....worked like a treat!
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply