sending the duplicate records to error file

  • Hi,

    I have to eliminate duplicate records in a table. I'm doing that using Sort transform and check the eliminate duplicate.

    only two columns in the table will be used to find the duplicate record. rest of the columns may be duplicate or not.

    i would like to send the rest of the records which are duplicates to error file. how can i do this.

    Thanks,

    Regards

    Viji

  • One possibility is to put a script component after the sort that saves the key fields for the previous row. When a new row comes in, it compares its key fields to what has been saved. If they are the same, it flags this row as a duplicate. It then saves its key fields for the next row to be compared to.

    You can put a conditional split after it to split based on duplicate flag.

    This does involve allowing the sort to pass the duplicates.

    Sample code:

    Public Class ScriptMain

    Inherits UserComponent

    Dim wPrevKeyField1 As String = ""

    Dim wPrevKeyField2 As String = ""

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Dim wKeyField1 As String

    Dim wKeyField2 As String

    Dim wIsDup As Boolean

    wKeyField1 = Row.KeyField1

    wKeyField2 = Row.KeyField2

    If wKeyField1 = wPrevKeyField1 And wKeyField2 = wPrevKeyField2 Then

    wIsDup = True

    Else

    wIsDup = False

    End If

    Row.IsDup = wIsDup

    wPrevKeyField1 = wKeyField1

    wPrevKeyField2 = wKeyField2

    End Sub

    End Class

  • Thanks a lot.

    Will try to implement the same. But i'm from database side so don't knw much about .net coding. i'll try to take help from others. is there a way to do it only through SSIS transformations??

  • Change your data source from 'table' to SQL command.

    Select fields from your table and (critically) add a Row_Number() column partitioned on the fields you are using to identify duplicates. So the data rows will be

    1

    1

    1

    2 [dupe]

    3 [dupe]

    1

    2 [dupe]

    etc etc

    Redirect all rows where Row_Number 1 to your error file.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • after several hours trying to figure out how to do this i found your method - and works perfectly - many thanks

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply