source with duplicate records

  • I have a source with duplicate entries i.e records in a source file or table . how can i eliminate the duplicate records , i can write a query to perform this but i want to use a transformation to capture the duplicate records.

    Any idea how to eliminate the duplicates or capture duplicates in other o/p.

  • You can use a RanK() or Row number() function in SQL to identify duplicate records and then use a Conditional Split and then Split the rows based on that Rank column .. You can capture all the records by keeping a destination for Each condition

  • Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.

  • udaya_1222 (3/10/2010)


    Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.

    You can use sort Transformation and check the option "Remove rows with duplicate sort values" in the Sort Transformation Editor dialog.

    But you will not be able to capture duplicate records...

    Below Links might help You

    Rank transformation that uses the same logic as Rank() function in SQL

    http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Rank-Transform.aspx

    This might also help

    http://toddmcdermid.blogspot.com/2009/01/extracting-duplicate-primary-key-rows.html

  • Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.

  • udaya_1222 (3/10/2010)


    Can anyone give me a some idea to eliminate duplicate records without using sql functions and by only using transformations.

    Heh... what's wrong with a little T-SQL here?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • udaya_1222 (3/10/2010)


    Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.

    In your earlier post, you used the word 'eliminate'. Now you want to keep the dups - it would help everyone if you stated your requirements clearly up front, then you will not get inappropriate solution suggestions and waste people's time.

    And I'm with Jeff - T-SQL will do this way faster than sort/aggregate functions in SSIS.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • udaya_1222 (3/10/2010)


    Thanks Devyanth , Sort transformation was helpful but now i have a question how should i build my data flow to capture duplicate records in a file and what are all things i should take care of.

    You "CANNOT" capture duplicate rows using Sort Transformation.. As every1 suggests the best approach would be to use rank() or RowNumber() functions in T-SQL and then use a Conditional Split --- If you want to capture duplicates

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply