Removing duplicate data from column 2 on the priority of column 1

Question

Removing duplicate data from column 2 on the priority of column 1

ondrocks

Old Hand

Points: 393
More actions
June 19, 2010 at 6:22 pm

#94921

Hello all site members ,
I have a table with 100K + records and consisting of 4 columns.website_name,email,fname,lname
The website name field consist of 1 among these 5 websites ebay,amazon,bestbuy,wallmart,target
I need to remove duplicate email id,such that email is which correspond to 'ebay' should not be present for amazon,bestbuy,wallmart,target..then in next iteration the email ids correspond to 'amazon' should not be there for bestbuy,wallmart,target...in 3rd iteration email ids correspond to 'bestbuy' should not be there for wallmart,target ..in 4th email id corresponding to 'wallmart' should not be there for target.
Priority order to keep the email id is ebay>amazon>bestbuy>wallmart>target.
I am wondering how can I do this.This is going to be a regular activity so I cant rely on manual cleansing.Can someone help me with the script/SSIS to do this.Or the logic to write a script?SSIS on this.
🙁

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

First: What's the reason to use SSIS for it?

Create a lookup table holding your website_name and your ordering number. Use something along "ROW_NUMBER() OVER (PARTITION BY email ORDER BY website_name_order) as row" in a subquery or CTE. Query for row<>1 in the outer SELECT. Once you've verified that those are the rows you want to delete, change the SELECT to DELETE. You could also add a column to mark the rows that will be deleted to perform some additional tests before physically delete the rows. That's up to you... 😉

Side note: It might be even better to redesign your database and get separate tables for website, email and fname/lname. But that's a different story....

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 2

if you still struggling for solution , post sample data along with desired output.

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)