June 19, 2010 at 6:22 pm
Hello all site members ,
I have a table with 100K + records and consisting of 4 columns.website_name,email,fname,lname
The website name field consist of 1 among these 5 websites ebay,amazon,bestbuy,wallmart,target
I need to remove duplicate email id,such that email is which correspond to 'ebay' should not be present for amazon,bestbuy,wallmart,target..then in next iteration the email ids correspond to 'amazon' should not be there for bestbuy,wallmart,target...in 3rd iteration email ids correspond to 'bestbuy' should not be there for wallmart,target ..in 4th email id corresponding to 'wallmart' should not be there for target.
Priority order to keep the email id is ebay>amazon>bestbuy>wallmart>target.
I am wondering how can I do this.This is going to be a regular activity so I cant rely on manual cleansing.Can someone help me with the script/SSIS to do this.Or the logic to write a script?SSIS on this.
🙁
June 20, 2010 at 2:42 am
First: What's the reason to use SSIS for it?
Create a lookup table holding your website_name and your ordering number. Use something along "ROW_NUMBER() OVER (PARTITION BY email ORDER BY website_name_order) as row" in a subquery or CTE. Query for row<>1 in the outer SELECT. Once you've verified that those are the rows you want to delete, change the SELECT to DELETE. You could also add a column to mark the rows that will be deleted to perform some additional tests before physically delete the rows. That's up to you... 😉
Side note: It might be even better to redesign your database and get separate tables for website, email and fname/lname. But that's a different story....
June 21, 2010 at 7:12 am
if you still struggling for solution , post sample data along with desired output.
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply