Deduplication Help

Question

Post reply

Deduplication Help

MostInterestingMan

SSCertifiable

Points: 6291
More actions
January 30, 2012 at 10:00 am

#252124

Hi,
My goal is to dedupe (with a twist) the sample table below with the following criteria. For each duplicate record I want to:
1) If [item_ordered?] column is 'Yes' for all the deplicate records keep any one of the records - doesn't matter which record.
2) If [item_ordered?] column is 'No' for all the duplicate records keep any one of the records - doesn't matter with record.
3) If [item_ordered?] column is 'Yes' for some duplicate records and 'No' for others, keep one of the 'Yes' records - doesn't matter which record.
/* sample dataset */
CREATE TABLE #TestTable (id INT, order_status VARCHAR(10), [item_ordered?] VARCHAR(3))
INSERT #TestTable (id, order_status, [item_ordered?])
SELECT 100, 'Ordered', 'Yes' UNION ALL
SELECT 100, 'Staged', 'Yes' UNION ALL
SELECT 100, 'Shipped', 'Yes' UNION ALL
SELECT 200, 'Hold', 'No' UNION ALL
SELECT 200, 'Cancelled', 'No' UNION ALL
SELECT 200, 'Hold', 'No' UNION ALL
SELECT 200, 'Returned', 'No' UNION ALL
SELECT 300, 'Ordered', 'Yes' UNION ALL
SELECT 300, 'Ordered', 'Yes' UNION ALL
SELECT 300, 'Staged', 'Yes' UNION ALL
SELECT 300, 'Shipped', 'Yes' UNION ALL
SELECT 300, 'Shipped', 'Yes' UNION ALL
SELECT 400, 'Cancelled', 'No' UNION ALL
SELECT 400, 'Review', 'No' UNION ALL
SELECT 400, 'Hold', 'No' UNION ALL
SELECT 400, 'Returned', 'No' UNION ALL
SELECT 500, 'Ordered', 'Yes' UNION ALL
SELECT 500, 'Pending', 'No' UNION ALL
SELECT 500, 'Staged', 'Yes' UNION ALL
SELECT 500, 'Shipped', 'Yes' UNION ALL
SELECT 600, 'Hold', 'No' UNION ALL
SELECT 600, 'Review', 'No' UNION ALL
SELECT 600, 'Pending', 'Yes' UNION ALL
SELECT 700, 'Ordered', 'Yes' UNION ALL
SELECT 700, 'Hold', 'No' UNION ALL
SELECT 700, 'Cancelled', 'No' UNION ALL
SELECT 700, 'Returned', 'No' UNION ALL
SELECT 800, 'Hold', 'No' UNION ALL
SELECT 800, 'Returned', 'No' UNION ALL
SELECT 900, 'Staged', 'Yes' UNION ALL
SELECT 900, 'Shipped', 'Yes'
GO
Thanks for your help.
MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 1

WITH CTE AS (

SELECT *,

ROW_NUMBER() OVER(PARTITION BY id, order_status

ORDER BY CASE WHEN [item_ordered?]='Yes' THEN 0 ELSE 1 END) AS rn

FROM #TestTable)

DELETE FROM CTE

WHERE rn>1;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

spaghettidba SSC Guru Points: 105732 More actions · Answer 2

First of all you have to define what "duplicate" means to you. Duplicate records for the same Id? Duplicate records for the same id+status?

If the former, this should do:

SELECT *

FROM (

SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC)

FROM #TestTable

) AS src

WHERE RN = 1

-- Gianluca Sartori

MostInterestingMan SSCertifiable Points: 6291 More actions · Answer 3

Gianluca Sartori (1/30/2012)
First of all you have to define what "duplicate" means to you. Duplicate records for the same Id? Duplicate records for the same id+status?
If the former, this should do:
SELECT *
FROM (
SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC)
FROM #TestTable
) AS src
WHERE RN = 1

Sorry I wasn't clear - Deplicate for the same ID not ID+status.

MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends

spaghettidba SSC Guru Points: 105732 More actions · Answer 4

spaghettidba

SSC Guru

Points: 105732

January 30, 2012 at 11:45 am

#1440178

I think you should be ok, then.

-- Gianluca Sartori

MostInterestingMan SSCertifiable Points: 6291 More actions · Answer 5

This is exactly the solution I was looking for..works like a charm. Thanks for all your help.

MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends

Siva Ramasamy SSCarpal Tunnel Points: 4026 More actions · Answer 6

Hi,

I was just following this thread and understood the requirement and I also executed your solution and found that it satisfies the requirement...but not able to understand how you achieved it.

Can you please explain..?

Thanks!

Siva.

spaghettidba SSC Guru Points: 105732 More actions · Answer 7

ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC)

means "give each row in the table a row number (rank) sorting rows by [item_ordered?] in reverse order (so that "yes" ranks higher than "no") and then by id, partitioning by id (means that the rank will be "reset" for each different id)".

Applied to the test data, it gives this output:

id order_status item_ordered? RN

----------- ------------ ------------- --------------------

100 Ordered Yes 1

100 Staged Yes 2

100 Shipped Yes 3

200 Hold No 1

200 Cancelled No 2

200 Hold No 3

200 Returned No 4

300 Ordered Yes 1

300 Ordered Yes 2

300 Staged Yes 3

300 Shipped Yes 4

300 Shipped Yes 5

400 Cancelled No 1

400 Review No 2

400 Hold No 3

400 Returned No 4

500 Ordered Yes 1

500 Staged Yes 2

500 Shipped Yes 3

500 Pending No 4

600 Pending Yes 1

600 Hold No 2

600 Review No 3

700 Ordered Yes 1

700 Hold No 2

700 Cancelled No 3

700 Returned No 4

800 Hold No 1

800 Returned No 2

900 Staged Yes 1

900 Shipped Yes 2

Hope this answers your question.

-- Gianluca Sartori

Ron McCullough SSC Guru Points: 63877 More actions · Answer 8

Siva Ramasamy (1/30/2012)
Hi,
I was just following this thread and understood the requirement and I also executed your solution and found that it satisfies the requirement...but not able to understand how you achieved it.
Can you please explain..?
Thanks!
Siva.

The following may answer your question;

http://www.simple-talk.com/sql/t-sql-programming/sql-server-cte-basics/

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Siva Ramasamy SSCarpal Tunnel Points: 4026 More actions · Answer 9

Thanks Gianluca Sartori and Thanks Ron.

I understood the logic.