July 28, 2008 at 7:38 am
The following problems occurred several times and I have not been able to identify the root cause:
I have a table containing customer data (name, firstname, address, etc..) which are going through the fuzzy grouping transformation in order to identify duplicates. The columns used by the fuzzy grouping as criteria to match duplicates are, in some cases corrupted as if there were a buffer reinitialisation issue with this transformation. Here are some examples
- input data : (I show only an excerpt of the columns used by the fuzzy grouping):
id : 457-191-1829
last name: PIERSON
first name: NATHAN
address: 6 IMPASSE DU PICABRIER
- output data :
ID
id: 457-191-1829
last name : PIERSONEZ
first name: NATHANIERRE
address: 6 IMPASSE DU PICABRIER DI
as you can see, some letters have been added at the end of first name, last name, address coming from another record. It looks like there is a buffer reinitialization problem with this transformation.
Have you ever faced this problem ? any suggestion to work around this issue ?
Thanks
Laurent
Laurent
July 29, 2008 at 3:54 pm
I found the root cause. In fact, the type of my string columns was varchar. I replaced all varchar by char and this problem disappeared. It seems that the fuzzy grouping does not do a good job cleaning bytes used by varchar when it gets data into the buffer.
Laurent
September 2, 2011 at 2:07 pm
Thanks Grasshopper, I have the same issue.
When you said you have changed all string columns from varchar to char - do you mean at the OLE DB destination level (where it creates the table)?
September 2, 2011 at 2:50 pm
I have changed the DDL/SQL to create destination table...
...replacing...
"varchar("
with
"char("
...the data is still corrupted.
Is there another way?
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply