source with duplicate records

Question

source with duplicate records

udaya_1222

SSC Rookie

Points: 34
More actions
March 10, 2010 at 9:28 am

#156765

I have a source with duplicate entries i.e records in a source file or table . how can i eliminate the duplicate records , i can write a query to perform this but i want to use a transformation to capture the duplicate records.
Any idea how to eliminate the duplicates or capture duplicates in other o/p.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

divyanth SSCarpal Tunnel Points: 4303 More actions · Answer 1

You can use a RanK() or Row number() function in SQL to identify duplicate records and then use a Conditional Split and then Split the rows based on that Rank column .. You can capture all the records by keeping a destination for Each condition

udaya_1222 SSC Rookie Points: 34 More actions · Answer 2

Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.

divyanth SSCarpal Tunnel Points: 4303 More actions · Answer 3

udaya_1222 (3/10/2010)
Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.

You can use sort Transformation and check the option "Remove rows with duplicate sort values" in the Sort Transformation Editor dialog.

But you will not be able to capture duplicate records...

Below Links might help You

Rank transformation that uses the same logic as Rank() function in SQL

http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Rank-Transform.aspx

This might also help

http://toddmcdermid.blogspot.com/2009/01/extracting-duplicate-primary-key-rows.html

udaya_1222 SSC Rookie Points: 34 More actions · Answer 4

Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 5

udaya_1222 (3/10/2010)
Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.

Heh... what's wrong with a little T-SQL here?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Parkin SSC Guru Points: 247227 More actions · Answer 6

udaya_1222 (3/10/2010)
Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.

In your earlier post, you used the word 'eliminate'. Now you want to keep the dups - it would help everyone if you stated your requirements clearly up front, then you will not get inappropriate solution suggestions and waste people's time.

And I'm with Jeff - T-SQL will do this way faster than sort/aggregate functions in SSIS.

divyanth SSCarpal Tunnel Points: 4303 More actions · Answer 7

udaya_1222 (3/10/2010)
Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.

You "CANNOT" capture duplicate rows using Sort Transformation.. As every1 suggests the best approach would be to use rank() or RowNumber() functions in T-SQL and then use a Conditional Split --- If you want to capture duplicates