European character causes error when importing Flat File UTF-8 code page with Ragged right format

Question

Post reply

European character causes error when importing Flat File UTF-8 code page with Ragged right format

jeanyvesR

Say Hey Kid

Points: 696
More actions
May 16, 2012 at 7:55 am

#255983

Hi all,
I use SQL Server 2008R2(US) and I want to load in a table a Ragged right flat file(see file attached) in UTF-8 format that contains some European characters(é,è,â ..) such as Pièce(part in english).
In the Connection Manager I defined following parameters :
General Item :
File Name : c:\temp\TEST-UTF8.txt
Locale : English(United-States)
Code Page : 65001(UTF-8)
Format : Ragged Right
Default regarding other parameters
Columns Item :
Row Delimiter : {CR}{LF}
Advanced Item :
Column0 :
Data Type : DT_STR
inputColumnWidth : 9
OutputColumnWidth : 9
Column1 :
Data Type : DT_STR
inputColumnWidth : 6
OutputColumnWidth : 6
Column2 :
Data Type : DT_STR
inputColumnWidth : N.A
OutputColumnWidth : 4
Preview:
The file Preview you can see that data after a "foreign" character(first row in the file) are offset to the right by the number of "foreign" character encountered.
This offest cause an error if I run the package with a Flat File Source component :
Data conversion failed. The data conversion for column "Column 2" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page."
If text file doens't contain European characters then the package runs works fine.
How can I solve this problem ?
Thanks in advance
Jean-Yves

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 1

Please post a screenshot of your Data Flow.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

jeanyvesR Say Hey Kid Points: 696 More actions · Answer 2

For the moment I made a simple package with only Flat File Source(DATA_FLOW.JPG) because this step doesn't work.

I have also made a second screen capture(DATA_SHIFTED.JPG) as you can see in column2 data shifted on the right of one space because

column 1 has special character (è) .

When I run package I have error on first row and column2 because since data in column2 are shifted of one character to right it cause a problem of data truncation (column2 = " 150"(4 characters) + "0{CR}{LF}).

SSIS doesn't find {CR}{LF} after Column2 but "0{CR}" ...

Thank you for your help.

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 3

Well I can confirm the issue in BIDS 2008 (IS v10.50.2806.0). I looked at your test file and the BOM and EOLs are correct. I also checked the small e with grave and it too is represented correctly (0xC3 0xA8). It must be a bug with ragged-right and UTF-8 encoding, or something we are doing wrong in setting up the Flat File Connector. For what it is worth when I modify your file to be comma delimited with CR-LF line breaks (attached) I can import it properly.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Tim Pain Ten Centuries Points: 1136 More actions · Answer 4

I struggled with this as well. It's slightly messy, but what I ended up doing was creating a temp table with a single field, then loading the UTF-8 source into that field and from there just splitting out the individual fields that I needed which a bunch of SUBSTRING commands.

This way the é,è,â etc' were handled correctly.

Let me know if you need more details.

Tim

jeanyvesR Say Hey Kid Points: 696 More actions · Answer 5

Yes . I have also made some test with delimited format and code page UTF-8 and it works fine.

I have also tested with Ragged Right format and Code page 1252 (western European) and it works...

It seems ther is a bug with ragged right format and code page UTF-8.

Tim , Thank you for your workaround .

I'm going to ask a file formated in delimited format rather than ragged right format.

I don't want to complicate my package.

Thank you again for your answers.

Jean-Yves

jeanyvesR Say Hey Kid Points: 696 More actions · Answer 6

Hi all,

I have raised this issue to MS :

http://connect.microsoft.com/SQLServer/feedback/details/742632/european-character-causes-error-when-importing-flat-file-utf-8-code-page-with-ragged-right-format]

They answered : "We are actively looking at the issue and will keep you updated" so wait and see ...