March 5, 2008 at 3:21 pm
Cool... let's see your solution for this, Brandon.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2008 at 6:48 am
Heh... Hour or so? Like I said previously, I've used this method to do 5.1 million rows in about 3 minutes. I'm sure there's a faster method, but it worked for me.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2008 at 7:15 am
Just to clarify - the "hour or so" being referred to was the time taken to create the package.
Rgds,
Paul Ibison
March 6, 2008 at 8:57 am
The "Ragged Right" Files I have to deal with are Mixed Row Type files as well. Because of that, I cannot use column names from the csv file, but also, each row starts with a TypeID.
So, in DTS, I create a transform task for each row type and use a Transform Task with this code on the first column:
Function Main()
If DTSSource("Col001") = "HDRID" then
Main =DTSTransformStat_OK
ELSE
Main = DTSTransformStat_SkipRow
End if
End Function
In the example for the article your initial transform would just check the length and skip it if it was a header or footer. There's no need to import-export-import, even to get the column headings.
--
JimFive
March 6, 2008 at 9:34 am
Brandon,
all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.
Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.
This method can do the same in an hour or so and have a very simple resultant package.
Cheers,
Paul Ibison
March 6, 2008 at 12:53 pm
paul.ibison (3/6/2008)
Brandon,all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.
Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.
This method can do the same in an hour or so and have a very simple resultant package.
Cheers,
Paul Ibison
I'll admit I've never had to solve this exact problem before, where simply stripping headers and footers away would do the trick. Problems I've had to solve involving headers and footers have sometimes involved files with complex hierarchical structures from legacy systems, like this:
FILE
ORDERS|2
HDR|100293|987|20080326
ITM|897654|9876.87|3
ITM|098643|76.34|12
FTR|100293|2
HDR|100294|456|20080326
ITM|765432|11.99|6
FTR|100294|1
ENDORDERS|2
CUST|2
HDR|987
ITM|Joe|Jackson|98 Palomino Way|Los Angeles|CA|90823
FTR|987
HDR|456
ITM|Lisa|Lewis|123 Sesame Street|New York|NY|10014
FTR|456
ENDCUST|2
ENDFILE
How does your process work for files like this? Ignoring header and footer information in this file isn't an option since you will lose important information during the process, such as the order #s and order dates, the line item count, and other auditing information included in the file like record counts, etc.
March 6, 2008 at 1:07 pm
paul.ibison (3/6/2008)
Brandon,all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.
Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.
This method can do the same in an hour or so and have a very simple resultant package.
Cheers,
Paul Ibison
Unfortunately in my line of work...a IT solutions vendor...it's part of my job to be a tedious coder. Again, with all due respect to you, I've never seen an example in practice of having header and trailer records along with field names.
March 6, 2008 at 6:05 pm
pnewhart (3/6/2008)
Unfortunately in my line of work...a IT solutions vendor...it's part of my job to be a tedious coder. Again, with all due respect to you, I've never seen an example in practice of having header and trailer records along with field names.
With all due respect to all of y'all after these many years I've seen some zany stuff. I'm rarely surprised any more. I write file parsers all the time. I just don't have to use SQL to do the job. I'm also not reading millions of rows at a crack either.
We all have to play the cards we are dealt. We don't have to like them and we get to complain about them.
ATBCharles Kincaid
March 6, 2008 at 7:28 pm
Charles Kincaid (3/6/2008)
With all due respect to all of y'all after these many years I've seen some zany stuff. I'm rarely surprised any more. I write file parsers all the time. I just don't have to use SQL to do the job. I'm also not reading millions of rows at a crack either.We all have to play the cards we are dealt. We don't have to like them and we get to complain about them.
I agree with your point of view, as I've done the same type of thing and have written many custom file parsers and custom ETL applications over the years. Honestly I don't think I've ever seen a header/footer file format in which all of the headers and footers could be so easily discarded without a second thought.
March 6, 2008 at 8:52 pm
How does your process work for files like this?
I don't believe it was meant to... wasn't advertised that way except maybe by a slightly misleading title.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2008 at 9:16 pm
Jeff Moden (3/6/2008)
I don't believe it was meant to... wasn't advertised that way except maybe by a slightly misleading title.
Yes, catchy title. When I first clicked on it I actually expected it would be about the type of file I was asking about. The author's right though, I've never had any need to solve this very specific problem. I was kind of wondering how common this specific problem is compared to, say, ANSI X12 EDI and other formats where simply stripping headers and footers is not an option?
March 6, 2008 at 9:24 pm
Mike C (3/6/2008)
Yes, catchy title. When I first clicked on it I actually expected it would be about the type of file I was asking about. The author's right though, I've never had any need to solve this very specific problem. I was kind of wondering how common this specific problem is compared to, say, ANSI X12 EDI and other formats where simply stripping headers and footers is not an option?
Heh... guess it depends on what business you're in... I've been fortunate in not ever having to mess with EDI files. I have had to do exactly what's in this article a couple of dozen times. Not necessarily using the methods in this article each time, but same problem files.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2008 at 9:36 pm
Jeff Moden (3/6/2008)
Mike C (3/6/2008)
Heh... guess it depends on what business you're in... I've been fortunate in not ever having to mess with EDI files. I have had to do exactly what's in this article a couple of dozen times. Not necessarily using the methods in this article each time, but same problem files.
I guess I should have suspected, but I've gotten used to working with file formats where the headers and footers (if there are any) are used to group detail lines for specific reasons. I guess there's still a proliferation of headers and footers in files out there with the sole purpose of getting in the way to make the ETL process harder 🙂
March 19, 2008 at 4:35 pm
Is it possible to use SSIS to import this same flat file if it doesn't have a ROW delimiter?
I get these files which are basically text streams and they are only delimited by record.
I've written a utlity to handle it now (it's slow and difficult to manage on all my remote sites) but is there anything already in SSIS that would just delimit the row at a specified length?
Perhaps there's a different approach that doesn't require an external utility?
March 19, 2008 at 5:11 pm
johnr (3/19/2008)
Is it possible to use SSIS to import this same flat file if it doesn't have a ROW delimiter?I get these files which are basically text streams and they are only delimited by record.
I've written a utlity to handle it now (it's slow and difficult to manage on all my remote sites) but is there anything already in SSIS that would just delimit the row at a specified length?
Perhaps there's a different approach that doesn't require an external utility?
You can use the Bulk Insert task with a BCP format file to import fixed-length record files. You can probably do it in the data flow task as well with some transformations, although I haven't had occasion to do that myself yet.
Viewing 15 posts - 31 through 45 (of 51 total)
You must be logged in to reply to this topic. Login to reply