please help - transforming flat text via script component

  • I have a process that reads in a fixedwidth flat text file (inputfile), runs through a derived column object, then runs through a script component, then to a new fixedwidth text file (outputfile).

    The simple goal I'm aiming for is to do database lookups in the script component to retrieve data and then append that data to the end of the rows of the input file. I'm creating a brand new outputfile so as to not overwrite the original file.

    I'm having some major problems getting the line breaks right on the output file. If the inputfile is in UNIX format with LF as the line break, everything works fine. However, when I'm getting the inputfile in DOS format with CRLF, I appear to be getting an extra CRLF in my outputfile for each row. ????

    I don't know how that is happening, as when I'm reading in the inputfile row I'm only selecting the number of characters that there is of data - it shouldn't even be reading the line feed characters in the file.

    Since there is a conspicuous lack of debugging for the script component, I've put msgboxes outputting the input row for each row. What is driving me nuts is that I can see in the messagebox that after the first row, each *input* row keeps getting a new line break added to the front (possibly coming from the tail end of the row before it??) But these are input rows - they shouldn't even be being modified, should they?

    I don't get why I should be having to mess with line breaks in the first place...is there a straightforward walkthrough of reading in a blasted text file, parsing out pieces of it, and writing it back out to another text file?

    Thanks for any help!

  • What do you have specified as row delimiter in the flat file connection manager?

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • The header row delimeter is CRLF, though I've tried LF before; it doesn't make a difference. There is no row delimeter as I've got it set as fixed-width. But the rub is that if I grab just the characters I need from each row, ignoring the end-of-line, it messes it up because I ignore the end of the line. But I don't necessarily know if I'm going to get a CRLF or a LF...and why should it matter? I'm importing fixed-width columns, isn't that what I should be using? I tried ragged-right, but it seems like there was some reason that didn't work for me either.

    Why is there no equivalent to ".readline" like in regular C#?

  • You have equivalent to readline. Just define your layout as one column and use CRLF as your row delimiter and you should be good to go.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • Bumping this back up as I did end up solving my original problem, but have another related question.

    I ended up dumping fixed-width altogether and going with ragged-right. Afterwards it seemed obvious, I think it's just a perception thing. In my mind my file was fixed width, so that's what I should go with. But in SSIS-land if I just want to read to a line delimiter, I want ragged right, even if the width of each row is going to be the same! Anyway....using ragged-right with a known line delimiter worked, and brings me to my current situation.

    Is there any way to handle both CRLF and LF in the same process? We have an automation process that is dropping a file off to my SSIS package. It's actually the same file being pushed twice, but one way sends it directly from UNIX (LF), and the other way is being pre-processed and coming out of Windows (CRLF). SSIS is set to break on LF, and blows up when it hits the windows version of the file.

    We thought we could still break on LF, and then when reading the file use something to replace the CR or \r and we'd be good. SSIS doesn't look to work that way, though. Each row in my input file is 2800 chars long, with either a CRLF or a LF at the end of line. It's set with the OutputColumnWidth of 2800 and to break on LF. If I give it a file with CRLF, it blows up trying to truncate it because there's an extra "character".

    I think I could get around it by upping the columnwidth to 2801, but then if I give it a file with just LF it's not going to like it because it's too short, right?

    Is there any way to accommodate both ways here? Can I dynamically set the ColumnDelimeter in a variable or something at run time? Anything else? Thanks for any info!

  • You can pre-process the file with LF only and replace it with CR LF. You may check this script for reference.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply