April 29, 2009 at 5:43 am
Hi,
I have to eliminate duplicate records in a table. I'm doing that using Sort transform and check the eliminate duplicate.
only two columns in the table will be used to find the duplicate record. rest of the columns may be duplicate or not.
i would like to send the rest of the records which are duplicates to error file. how can i do this.
Thanks,
Regards
Viji
May 1, 2009 at 5:49 am
One possibility is to put a script component after the sort that saves the key fields for the previous row. When a new row comes in, it compares its key fields to what has been saved. If they are the same, it flags this row as a duplicate. It then saves its key fields for the next row to be compared to.
You can put a conditional split after it to split based on duplicate flag.
This does involve allowing the sort to pass the duplicates.
Sample code:
Public Class ScriptMain
Inherits UserComponent
Dim wPrevKeyField1 As String = ""
Dim wPrevKeyField2 As String = ""
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim wKeyField1 As String
Dim wKeyField2 As String
Dim wIsDup As Boolean
wKeyField1 = Row.KeyField1
wKeyField2 = Row.KeyField2
If wKeyField1 = wPrevKeyField1 And wKeyField2 = wPrevKeyField2 Then
wIsDup = True
Else
wIsDup = False
End If
Row.IsDup = wIsDup
wPrevKeyField1 = wKeyField1
wPrevKeyField2 = wKeyField2
End Sub
End Class
May 3, 2009 at 11:32 pm
Thanks a lot.
Will try to implement the same. But i'm from database side so don't knw much about .net coding. i'll try to take help from others. is there a way to do it only through SSIS transformations??
May 4, 2009 at 12:02 am
Change your data source from 'table' to SQL command.
Select fields from your table and (critically) add a Row_Number() column partitioned on the fields you are using to identify duplicates. So the data rows will be
1
1
1
2 [dupe]
3 [dupe]
1
2 [dupe]
etc etc
Redirect all rows where Row_Number 1 to your error file.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 15, 2009 at 7:36 am
after several hours trying to figure out how to do this i found your method - and works perfectly - many thanks
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply