November 26, 2010 at 9:01 am
Hi,
This driving me nuts!
I'm being sent a unicode CSV file that has some text qualifiers and is kind of partially fixed width. The text qualifiers are only used when necessary. For example:
XXXX ,"1,400.00",GBN01 ,*XXXXXXXXX ,20,70.00,"1,400.00",17.50,300
I've XXXXX'd out sensitive data.
The problem is that I'm getting an error when trying to import this line, and it's the field "1,400.00" that is causing the problem. When I set up the Flat File Connection Manager I've specified " as the text qualifier and everything works fine. I preview the file and everything is just as I would expect.
However, when I come to run the SSIS package I get the following error:
"The data value cannot be converted for reasons other than sign mismatch or data overflow"
If I manually change the field from "1,400.00" to 1400.00 with no text qualifier it works fine.
Please, someone, give me some clues as to why this isn't working.
Cheers
George
November 26, 2010 at 9:39 am
George
I think the problem is that 1,400.00 is actually a number, which is being sent as text because it has a "," as a thousands separator. I would import it into a text field and then attempt to convert it into a number.
John
November 26, 2010 at 9:48 am
Hi John,
Thanks for the reply.
I've tried that. It just pushes the problem on. It seems that if I do that when it gets to the next numeric field it's trying to load text in to it. In other words, it's like the comma in the "1,400.00" is being treated as a field separator and the text qualifiers are being ignored.
Could any of this be related to the file being unicode? I tried processing it as non-unicode, but that obviously created problems with the file not being read correctly.
Cheers
George
November 26, 2010 at 9:51 am
George
Another way round it might be to open the file in Excel, format the columns appropriately and then save it and import from that. You could write a VB script that would do that for you in the package.
John
November 26, 2010 at 9:56 am
Hi John,
Yeah, I have thought about doing some pre-processing to convert the file to | or TAB delimeted. I just can't believe it doesn't work straight out of the tin.
I've fired off an email to the supplier who's sending the data to ask if they can change the format of the file. I hope they can as this is driving me nuts 🙂
In the meantime though, if you, or anyone, knows why this is happening I would really appreciate some help.
Cheers
George
November 26, 2010 at 2:41 pm
Hi George,
Its not working because you file is quite non-standard. Best bet is to get the file sent as straight csv or straight fixed width and with consistent use of delimiters.
Otherwise you may need to import the file into a staging table with one column and then parse out that data into meaningful columns into you final destination table.
November 26, 2010 at 7:40 pm
GN-340682 (11/26/2010)
Hi John,Yeah, I have thought about doing some pre-processing to convert the file to | or TAB delimeted. I just can't believe it doesn't work straight out of the tin.
I've fired off an email to the supplier who's sending the data to ask if they can change the format of the file. I hope they can as this is driving me nuts 🙂
In the meantime though, if you, or anyone, knows why this is happening I would really appreciate some help.
Cheers
George
You're taking the correct steps. The first step is always to ask the vendor to meet certain requirements. The problem is that they're probably exporting from a formatted spreadsheet or table or, worse, from a reporting system that does formatting.
Please do post back if you have no success with the vendor. There's a couple of ways to get around this problem "auto-magically" without having to import and re-export from a spreadsheet on your end.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 27, 2010 at 12:46 am
GN-340682 (11/26/2010)
Hi John,Thanks for the reply.
I've tried that. It just pushes the problem on. It seems that if I do that when it gets to the next numeric field it's trying to load text in to it. In other words, it's like the comma in the "1,400.00" is being treated as a field separator and the text qualifiers are being ignored.
Could any of this be related to the file being unicode? I tried processing it as non-unicode, but that obviously created problems with the file not being read correctly.
Cheers
George
It's easy to understand why SSIS is getting confused with this format, which is most unusual. Including a thousands separator in a numeric field is a hindrance and it is this which I would address first. Based on the sample data provided, I can see no straightforward way of pre-processing the data to make it import correctly ('Delete all commas between concurrent sets of double quotes' is easy enough to state but more difficult to code).
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 29, 2010 at 3:07 am
Thanks to everyone who posted. It's nice to know I'm not going mad, and it is an odd file format 🙂
Hopefully they'll be able to change format and I won't have to code around it.
Cheers
George
November 30, 2010 at 5:10 am
Just to add a little more detail to this. After much fiddling I managed to get the data to import to the load table. This is how it's being imported (I've turned each column in to a row to make it visually easier to see):
"1
400.00"
XXXXX
XXXXX
20.00
70.00
"1
400.0
As you can see SSIS has completely ignored the fact I've specified a text qualifier of " and effectively pushed all the columns along. Why does this work fine in the Flat File Connection Manager Editor (under the preview tab), but then fails so spectacularly when it comes to running it?
I'm still waiting to hear if the file format can be changed, so in the meantime any ideas on this would be useful. Thanks
Cheers
George
November 30, 2010 at 5:19 am
i had same issue where we use to import CSV files. Best option i would recommend is first clean up your CSV by removing columns with internal COMMA's
Good luck
November 30, 2010 at 6:48 am
SQLSmasher (11/30/2010)
i had same issue where we use to import CSV files. Best option i would recommend is first clean up your CSV by removing columns with internal COMMA'sGood luck
Any suggestions on how you would do that? Maybe some code?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 30, 2010 at 7:13 am
Jeff,
The method i can straight away suggest is to use SSIS.
1. Create in-memory single column table with each csv row as table row
2. find COMMA's between "xxxx,yyyy"
3. replace with "xxxxyyyy"
4. continue with your particular SSIS job by reteriving data from this new table
Thanks
November 30, 2010 at 7:18 am
Part 2 is the tricky bit though. If you've done this before, I think it would be useful if you could provide a code snippet.
Tricky because
"1,400.00",17.50,300
needs to become
"1400.00",17.50,300
so the rules about which commas to remove are not straightforward ...
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 30, 2010 at 8:19 am
Here's a *cough* loop version I just tried to solve that little puzzle. I'm more than confident there are much better solutions available...
Edit: a recursive CTE would be possible, too. But at least this one is easily identified as a loop solution... 😉
DECLARE
@t VARCHAR(30),
@repl CHAR(1),
@identifier1Pos INT,
@identifier2Pos INT,
@commaPos INT,
@lengthPos INT
SET @repl='!' -- intermediate replacement for a comma in order not to have to deal with the string changes in length
SET @t='"1400.00,",",17.50",300,",",'
SELECT
@identifier1Pos= charindex('"',@t), -- position of the first identifier
@commaPos= charindex(',',@t,@identifier1Pos+1), -- position of the first comma after the identifier just found
@identifier2Pos= charindex('"',@t,@identifier1Pos+1), -- position of the 2nd identifier
@lengthPos= len(@t) -- overall strin length
WHILE @identifier2Pos>0 and @identifier1Pos>0 and @commaPos>0 -- as long as all there are two identifier and a comma
BEGIN
IF @identifier1Pos< @commaPos and @commaPos < @identifier2Pos -- if comma is in between the two identifier
SELECT @t=stuff(@t,@commaPos,1,@repl) -- replace comma with another character
SELECT -- grab the next set of comma and identifier
@identifier1Pos=charindex('"',@t,@identifier2Pos+1),
@commaPos= charindex(',',@t,@identifier1Pos+1),
@identifier2Pos=charindex('"',@t,@identifier1Pos+1)
END
SELECT REPLACE(@t,@repl,'') -- final output, remove the intermediate character
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply