March 10, 2010 at 9:28 am
I have a source with duplicate entries i.e records in a source file or table . how can i eliminate the duplicate records , i can write a query to perform this but i want to use a transformation to capture the duplicate records.
Any idea how to eliminate the duplicates or capture duplicates in other o/p.
March 10, 2010 at 9:46 am
You can use a RanK() or Row number() function in SQL to identify duplicate records and then use a Conditional Split and then Split the rows based on that Rank column .. You can capture all the records by keeping a destination for Each condition
March 10, 2010 at 10:18 am
Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.
March 10, 2010 at 11:27 am
udaya_1222 (3/10/2010)
Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.
You can use sort Transformation and check the option "Remove rows with duplicate sort values" in the Sort Transformation Editor dialog.
But you will not be able to capture duplicate records...
Below Links might help You
Rank transformation that uses the same logic as Rank() function in SQL
http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Rank-Transform.aspx
This might also help
http://toddmcdermid.blogspot.com/2009/01/extracting-duplicate-primary-key-rows.html
March 10, 2010 at 1:34 pm
Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.
March 10, 2010 at 4:34 pm
udaya_1222 (3/10/2010)
Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.
Heh... what's wrong with a little T-SQL here?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2010 at 1:33 am
udaya_1222 (3/10/2010)
Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.
In your earlier post, you used the word 'eliminate'. Now you want to keep the dups - it would help everyone if you stated your requirements clearly up front, then you will not get inappropriate solution suggestions and waste people's time.
And I'm with Jeff - T-SQL will do this way faster than sort/aggregate functions in SSIS.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 11, 2010 at 9:11 am
udaya_1222 (3/10/2010)
Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.
You "CANNOT" capture duplicate rows using Sort Transformation.. As every1 suggests the best approach would be to use rank() or RowNumber() functions in T-SQL and then use a Conditional Split --- If you want to capture duplicates
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply