January 14, 2016 at 11:05 am
I wonder whether anyone here fancies the challenge of trying to make this work in SSIS?
Of course, implicit in that is that you make it work faster than the native flat file source, or there is little point.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 14, 2016 at 11:12 am
It's all .NET, so could just try a script task or transform in SSIS. 😀
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
January 14, 2016 at 11:23 am
Alvin Ramard (1/14/2016)
It's all .NET, so could just try a script task or transform in SSIS. 😀
Script Component Source is where I would start, yes ...
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 14, 2016 at 4:25 pm
Phil Parkin (1/14/2016)
I wonder whether anyone here fancies the challenge of trying to make this work in SSIS?Of course, implicit in that is that you make it work faster than the native flat file source, or there is little point.
Saw that. As the author implied, few people get it to correct never mind being fast or easy on memory. I'd really have to run it through a knothole to make sure it didn't have any bugs in it like people that typically forget that an ending delimiter means an empty string should be returned after that delimiter. Also, will it return nulls, empty strings, spaces, or nothing at all for adjacent delimiters?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 15, 2016 at 11:52 am
The fact that the author is not citing RFC4180 is troubling. This is exactly where many implementations start out on a bad foot, namely they are making up their own CSV file specification.
RFC4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files
The algorithm for processing CSVs is not magical and it surprises me how many vendors (including Microsoft in some of their tools) have gotten it wrong over the years.
Back to the Code Project article...I have written code similar to what is needed to parse a CSV to strip "special" characters from incoming text files. Special in this case was anything not type-able on a US-101 keyboard and any control characters (e.g. tabs and line breaks inside text fields). The destination system was not tolerant of "special" characters and it was more efficient to strip them out in a pre-processing step prior to them hitting the database. The algorithm processed the file one character at a time and maintained a stack to allow for line breaks and tabs to be stripped out if inside text-delimiters (i.e. part of the text headed for the database) but maintained when outside a text-field, i.e. when they terminate a field or line.
I wonder how it compares to the project geared towards processing CSV files in SSIS that is posted on CodePlex:
CodePlex: Delimited File Source
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 15, 2016 at 12:31 pm
Orlando Colamatteo (1/15/2016)
The fact that the author is not citing RFC4180 is troubling. This is exactly where many implementations start out on a bad foot, namely they are making up their own CSV file specificationThe algorithm for processing CSVs is not magical and it surprises me how many vendors (including Microsoft in some of their tools) have gotten it wrong over the years
Heh... finally. Something that we won't have to look forward to a deep dive with each other. 😀 I absolutely agree!
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply