January 30, 2012 at 10:00 am
Hi,
My goal is to dedupe (with a twist) the sample table below with the following criteria. For each duplicate record I want to:
1) If [item_ordered?] column is 'Yes' for all the deplicate records keep any one of the records - doesn't matter which record.
2) If [item_ordered?] column is 'No' for all the duplicate records keep any one of the records - doesn't matter with record.
3) If [item_ordered?] column is 'Yes' for some duplicate records and 'No' for others, keep one of the 'Yes' records - doesn't matter which record.
/* sample dataset */
CREATE TABLE #TestTable (id INT, order_status VARCHAR(10), [item_ordered?] VARCHAR(3))
INSERT #TestTable (id, order_status, [item_ordered?])
SELECT 100, 'Ordered', 'Yes' UNION ALL
SELECT 100, 'Staged', 'Yes' UNION ALL
SELECT 100, 'Shipped', 'Yes' UNION ALL
SELECT 200, 'Hold', 'No' UNION ALL
SELECT 200, 'Cancelled', 'No' UNION ALL
SELECT 200, 'Hold', 'No' UNION ALL
SELECT 200, 'Returned', 'No' UNION ALL
SELECT 300, 'Ordered', 'Yes' UNION ALL
SELECT 300, 'Ordered', 'Yes' UNION ALL
SELECT 300, 'Staged', 'Yes' UNION ALL
SELECT 300, 'Shipped', 'Yes' UNION ALL
SELECT 300, 'Shipped', 'Yes' UNION ALL
SELECT 400, 'Cancelled', 'No' UNION ALL
SELECT 400, 'Review', 'No' UNION ALL
SELECT 400, 'Hold', 'No' UNION ALL
SELECT 400, 'Returned', 'No' UNION ALL
SELECT 500, 'Ordered', 'Yes' UNION ALL
SELECT 500, 'Pending', 'No' UNION ALL
SELECT 500, 'Staged', 'Yes' UNION ALL
SELECT 500, 'Shipped', 'Yes' UNION ALL
SELECT 600, 'Hold', 'No' UNION ALL
SELECT 600, 'Review', 'No' UNION ALL
SELECT 600, 'Pending', 'Yes' UNION ALL
SELECT 700, 'Ordered', 'Yes' UNION ALL
SELECT 700, 'Hold', 'No' UNION ALL
SELECT 700, 'Cancelled', 'No' UNION ALL
SELECT 700, 'Returned', 'No' UNION ALL
SELECT 800, 'Hold', 'No' UNION ALL
SELECT 800, 'Returned', 'No' UNION ALL
SELECT 900, 'Staged', 'Yes' UNION ALL
SELECT 900, 'Shipped', 'Yes'
GO
Thanks for your help.
MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends
January 30, 2012 at 10:09 am
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id, order_status
ORDER BY CASE WHEN [item_ordered?]='Yes' THEN 0 ELSE 1 END) AS rn
FROM #TestTable)
DELETE FROM CTE
WHERE rn>1;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537January 30, 2012 at 10:13 am
First of all you have to define what "duplicate" means to you. Duplicate records for the same Id? Duplicate records for the same id+status?
If the former, this should do:
SELECT *
FROM (
SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC)
FROM #TestTable
) AS src
WHERE RN = 1
-- Gianluca Sartori
January 30, 2012 at 11:11 am
Gianluca Sartori (1/30/2012)
First of all you have to define what "duplicate" means to you. Duplicate records for the same Id? Duplicate records for the same id+status?If the former, this should do:
SELECT *
FROM (
SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC)
FROM #TestTable
) AS src
WHERE RN = 1
Sorry I wasn't clear - Deplicate for the same ID not ID+status.
MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends
January 30, 2012 at 11:45 am
I think you should be ok, then.
-- Gianluca Sartori
January 30, 2012 at 2:38 pm
This is exactly the solution I was looking for..works like a charm. Thanks for all your help.
MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends
January 30, 2012 at 2:55 pm
Hi,
I was just following this thread and understood the requirement and I also executed your solution and found that it satisfies the requirement...but not able to understand how you achieved it.
Can you please explain..?
Thanks!
Siva.
January 30, 2012 at 3:44 pm
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC)
means "give each row in the table a row number (rank) sorting rows by [item_ordered?] in reverse order (so that "yes" ranks higher than "no") and then by id, partitioning by id (means that the rank will be "reset" for each different id)".
Applied to the test data, it gives this output:
id order_status item_ordered? RN
----------- ------------ ------------- --------------------
100 Ordered Yes 1
100 Staged Yes 2
100 Shipped Yes 3
200 Hold No 1
200 Cancelled No 2
200 Hold No 3
200 Returned No 4
300 Ordered Yes 1
300 Ordered Yes 2
300 Staged Yes 3
300 Shipped Yes 4
300 Shipped Yes 5
400 Cancelled No 1
400 Review No 2
400 Hold No 3
400 Returned No 4
500 Ordered Yes 1
500 Staged Yes 2
500 Shipped Yes 3
500 Pending No 4
600 Pending Yes 1
600 Hold No 2
600 Review No 3
700 Ordered Yes 1
700 Hold No 2
700 Cancelled No 3
700 Returned No 4
800 Hold No 1
800 Returned No 2
900 Staged Yes 1
900 Shipped Yes 2
Hope this answers your question.
-- Gianluca Sartori
January 30, 2012 at 3:44 pm
Siva Ramasamy (1/30/2012)
Hi,I was just following this thread and understood the requirement and I also executed your solution and found that it satisfies the requirement...but not able to understand how you achieved it.
Can you please explain..?
Thanks!
Siva.
The following may answer your question;
http://www.simple-talk.com/sql/t-sql-programming/sql-server-cte-basics/
January 30, 2012 at 4:09 pm
Thanks Gianluca Sartori and Thanks Ron.
I understood the logic.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply