Problem with Importing flat files with embedded (CR)(LF) characters.

  • Could someone assist I have a package which imports data from a flat file the connection manager has been set with a (CR)(LF) as its Row delimiter. There is a column called Comments, the last column to the right which has its text in quotes (text qualifier "). However there are times when the comments include embedded (CR)(LF) characters where the end user of the app hits the return key to skip to a new line, because the row delimiter is set as a (CR)(LF) the Flat file connection manager in SSIS interprets this (CR)(LF) as a new row and the import fails.

    This cannot be a unique problem to me can anyone provide a solution. I was trying to use the Derived Column task to add a append the comments column with a pipe character and change the row delimiter to pipe (|) however I have had no luck with that.

    I have no control of hoe the flat file is formatted its a csv with test column that can have (CR)(LF) whenever a user leaves a comment and hits the return key.

    Good suggestions that already worked are welcomed.

  • Could anyone provide with the below problem, I have posted the problem for over a year now and no one has replied eith any meaningful suggestions and I have still not maaged to sove the provlem.

  • If your data has literal CRLF characters in the data, SSIS won't know the difference between a CRLF that is supposed to be part of the data and a CRLF that represents the end of a row. Most likely you're going to have to write some custom script to shred this.

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • Thanks Tim would you have such a script that work, I am not a really not a script savvey, however I can figure what piece of code is doing a times.

  • I'll start by saying that I'm not an SSIS user but I've seen several posts where people have gotten around the problem of embedded delimiters in a comment field IF the comment field is contained in double quotes and you've set the "text qualifier" (not sure how to get there) for the import to be the double quote.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Agreed, Jeff - if it's qualified with quotes or some other character, it's a much easier problem to solve.

    Lacking that, to the OP, I would expect that a problem such as that with unqualified text containing literal CRLF would require a highly customized script. Unfortunately, you're probably looking at a lot of trial and error to do this.

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • If you read the original post, you'll see this:

    ...Comments, the last column to the right which has its text in quotes (text qualifier ")

    ...so the qualifier is there.

    This is a little bit different from the usual 'quote within a quote' problem, however.

    The problem here is more along the lines of "one or more row terminators within quotes", which takes the problem up a level in difficulty, I think.

    If it were me, I'd be looking at firing up a script component & getting my hands dirty with some text manipulation. It's not a quick job and it won't be pretty.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks Phil for highlighting where I initially pointed out that the comments column is already in double quotes, I think the supplier of the flat file has already done the right thing by adding the double quotes infact there are tripple quotes at times, and I think I have read that text with double quotes should treat all characters as literal text. So I don't know why I get this problem, I have to be checking the file daily and backspacing out the CRLF before the SQL Server Agent job that runs the file kicks off.

  • kingdonshel (8/3/2015)


    Thanks Phil for highlighting where I initially pointed out that the comments column is already in double quotes, I think the supplier of the flat file has already done the right thing by adding the double quotes infact there are tripple quotes at times, and I think I have read that text with double quotes should treat all characters as literal text. So I don't know why I get this problem, I have to be checking the file daily and backspacing out the CRLF before the SQL Server Agent job that runs the file kicks off.

    Have you actually setup the job to use the double quote as a text qualifier?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Actually no I have not used the """" quotes as a text qualifier, the reason for this is that only the Comments column has the double quotes, the other columns in the text file don't. Also the Header Row for the file is not double quoted only the text under the Comments columns is.

    in SSIS you can only set a Text qualifier for the Header Row and in this case its not needed.

  • kingdonshel (8/4/2015)


    Actually no I have not used the """" quotes as a text qualifier, the reason for this is that only the Comments column has the double quotes, the other columns in the text file don't. Also the Header Row for the file is not double quoted only the text under the Comments columns is.

    in SSIS you can only set a Text qualifier for the Header Row and in this case its not needed.

    It's been a long time since I used 2005, so this may not be possible for you, but in 2014, you can set the overall text qualifier for the file and then, for each column (in the 'Advanced' node) you can set the 'TextQualified' property to true or false.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • To add to what Phil stated, text qualifiers are usually conditional. That is, the system only considers them when they are present. They aren't treated as if they were a required delimiter.

    Try turning on the double quote as a text qualifier and then try the import. I believe you'll be pleasantly surprised.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I can't write it for you, but I would use Perl or RegEx to strip out any CR/LF short of the end of line as a pre-process step at the file level between receiving the file and subsequent processing. They're both fantastic for dealing with weird problems like this.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Hi Guys, I am using Bids 2008R2 and I am able to go to the Advance tab on the Control Manager I have set the the Text qualifier property to false and true, now the Flat File Source Task just fails, and when it works it simply stuffs the remainder of the data Comments column after the CRLF in the starting column on the next row. So I have still not managed to work around the hidden characters weather or not I specify "" as the Text qualifier in the General Tab of the connection manager or on the Advance Tab setting the Comments column text qualifier to True or fault.

    I just seem unable to remove the hidden characters CRLF from the Comments column.

    What is even more shocking I thought there would have been solutions readily available that I just don't know about but this appears to be very trick, since I have posted this a year agao and noone really replied only after I repost the topic again this year.

  • I saw someone post the idea of a script task and think this option is your best route.

    I did this a few times, I just imported the data into one column, then set up the script task to to parse the data into fields and do the string manipulation for the comments in the code.

    You should have no trouble finding basic instructions for parsing data through a script task.

    I found this fun when I did it, just took a few hours to code (depending on how many columns of data you have).

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply