Data corruption with SSIS Fuzzy Grouping transformation

  • The following problems occurred several times and I have not been able to identify the root cause:

    I have a table containing customer data (name, firstname, address, etc..) which are going through the fuzzy grouping transformation in order to identify duplicates. The columns used by the fuzzy grouping as criteria to match duplicates are, in some cases corrupted as if there were a buffer reinitialisation issue with this transformation. Here are some examples

    - input data : (I show only an excerpt of the columns used by the fuzzy grouping):

    id : 457-191-1829

    last name: PIERSON

    first name: NATHAN

    address: 6 IMPASSE DU PICABRIER

    - output data :

    ID

    id: 457-191-1829

    last name : PIERSONEZ

    first name: NATHANIERRE

    address: 6 IMPASSE DU PICABRIER DI

    as you can see, some letters have been added at the end of first name, last name, address coming from another record. It looks like there is a buffer reinitialization problem with this transformation.

    Have you ever faced this problem ? any suggestion to work around this issue ?

    Thanks

    Laurent

    Laurent

  • I found the root cause. In fact, the type of my string columns was varchar. I replaced all varchar by char and this problem disappeared. It seems that the fuzzy grouping does not do a good job cleaning bytes used by varchar when it gets data into the buffer.

    Laurent

  • Thanks Grasshopper, I have the same issue.

    When you said you have changed all string columns from varchar to char - do you mean at the OLE DB destination level (where it creates the table)?

  • I have changed the DDL/SQL to create destination table...

    ...replacing...

    "varchar("

    with

    "char("

    ...the data is still corrupted.

    Is there another way?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply