March 31, 2013 at 10:24 am
I posted my problem in other thread also but no go, I think this is the right forum to post my problem. This has been discuss couple of years ago in this forum but I dont think it lead to any solution.
Many people may have faced this problem and I searched many solutions but somehow I could not hit the right one. Sorry for repeatative question.
Here is my data in csv file and I am trying to import into sql server table but it does not import the way it should.
field1,field2,field3,field4,field5,field6
n1,n2,n3,stringWithout Quotes,"String in Quotes with ""words"" and coma , in the string",some more
n1,n2,n3,"stringWith Quotes","String in Quotes with coma , in thestring",some more
I need output
Record 1:
Field1: n1
Field2: n2
Field3: n3
Field4: stringWithout Quotes
Field5: String in Quotes with "words" and coma , in the string
Field6: some more
Record 2:
Field1: n1
Field2: n2
Field3: n3
Field4: stringWith Quotes
Field5: String in Quotes with coma , in thestring
Field6: some more
I tried to cleanup with Regex in script task ,mentioned in some other blog but it did not work and I am not expert to change RegEx to get desired output. Can anyone please help to find solution? Regex should work but any other solution is also welcome.
Thanks.
P.S. I tried to import same csv file in Access and it worked perfactly, so I know the csv format is correct.
April 1, 2013 at 6:35 am
Yuck. Ugly. It's a little tough to diagnose the specific problems you're having without the code for your scripts/regex.
You've got no consistency either within records or across records with the use of commas and quotes as field separators vs embedded characters. MS Office processes delimiters differently than SQL Server and SSIS so what works in Access/Excel isn't a good indicator of how SSIS handles it.
If you have any control or influence over the creation of the source data, see if you can't have the file created with a tab or pipe (|) character.
If not, really your best bet is to do what you've been trying: do some preprocessing on each line of the file before trying to import. RegEx isn't *that* hard, but it does take practice. There are a number of websites around that can help you test your RegEx.
I've also had some success importing files like this into a table where each line of the input file is a record in a varchar(max) field, then using T-SQL REPLACE() and SUBSTRING() in a stored procedure.
____________
Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.
April 1, 2013 at 8:16 am
This is what I use in RegEx
Regex.Replace(allText, @"""((?:""""|.)*?)""(?!"")", @"|~|$1|~|").Replace("\"\"", "\"")
April 1, 2013 at 12:30 pm
I think the next thing I might try is a script task that reads in a line at a time and then applies logic to parse the line. I'm not sure what that logic would be because you have a very unusal input. But I'm sure it would involve finding the locations of the commas and quotes and then making decisions about whether they function as delimiters or data.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply