DTS conversion to SSIS - File Import Fails on "" and , in text field

  • My company has recently upgraded our servers from SQL Server 2000 to SQL Server 2005. I've been trying to upgrade our DTS packages to SSIS, but the step that imports the file to the DB is failing.

    The import is a CSV file, text qualifier=". There can be double-quotes inside the text column, escaped with another double-quote. It seems to handle "" inside the text field okay, although it doesn't remove the escaping double-quote like DTS did, no big deal, I can replace that. But the problem is when a text field has an escaped double-quote followed by a comma inside the field.

    For example, this csv...

    "1","SC372","Requesting 36"" Model","MI"

    "2","WS302","Sent info on 24"" model, billed for repairs","TX"

    "3","PD389","Requested repair","NV"

    Will load as (sorry about the crappy formatting):

    Column0 | Column1 | Column2 | Column3

    1 | SC372 | Requesting 36"" Model | MI

    2 | WS302 | "Sent info on 24"" model | billed for repairs","TX"

    3 | PD389 | Requested repair | NV

    You can see that line 2 has the error in question. How can I get this to load properly in 2005? It worked just fine in 2000 and continues to work in the Legacy DTS mode in 2005 (also Excel, MySQL, basically every other tool out there), but not in SSIS.

    I'm fairly new to 2005, but have used 2000 for several years. Any tips would be greatly appreciated. I've been struggling with this one for almost a month.

  • Have you applied the latest service pack? (I know dumb question)

    Also have you looked on connect.microsoft.com to see if it is a known issue?

    I can definitely see what you are talking about but I haven't resolved that issue myself..

    CEWII

  • Thanks for your quick response Elliott...

    I do have SP3 installed. I have checked Connect and found a couple of issues related to embedded quotes, but not specifically with the "" + , issue. Neither of the issues that I found had been fixed.

    One of the workarounds suggested using "Log Parser", which we have been testing this week, but it's not perfect.

    Unfortunately I don't have a lot of push on changing the data feed at this point. If I did, what is the preferred file format to use for importing this type of text into the DB? Not only does the data have a lot of embedded quotes, it also can have a lot of embedded carriage returns as well, so I assume a fixed width format is impossible...

    It's been quite frustrating to have something that is so simple and easy in 2000 be impossible in 2005. Am I crazy to think that this is a basic ETL function that should be supported??

  • Normally I'd say fixed width, but the embedded CR/LF put a wrinkle in that. How accomplished are you with .net? If you are fairly good you could write a pre-processor as a control-flow custom SSIS task that could massage the data to something even slightly reasonable.

    As far as "It's been quite frustrating to have something that is so simple and easy in 2000 be impossible in 2005. Am I crazy to think that this is a basic ETL function that should be supported?? "

    I am unsure, but I will say this, the data file that you are working with *feels* kludgey like it wasn't thought out. Embedded CR/LF should be converted into "\r" and "" and such.

    Can I ask a different question, do you HAVE to consume this file, could you query the data directly from the source and avoid the entire problem?

    CEWII

  • My .NET skills are NULL. I've had to step into a dba roll without a lot of training so far... So I've been sure that I was doing something wrong in 2005, because I can get this file to load into 2000, MySQL, Excel, etc. with no problems.

    The data feed comes from a 3rd party, so there's no way for me to query directly unfortunately. It took their IT dept 4 weeks to add a timestamp to the file they were sending me, so a lot of hand-holding is required. The feed was set up before I took over, and it was probably a case of this is close enough, let's just use this.

    If I can push through a format change, what is the preferred format for importing? The feed is from a CRM system that has a lot of email messages/responses, hence the carriage returns and embedded ".

  • I would probably say XML, but I can't remember if SSIS can consume XML natively.. I take it that it is 3rd party outside your company? If not then you might be able to work it. I've seen people get all squeemish when XML in mentioned, usually, "I don't know how to do that" but it is REALLY easy, even if there system can't consume it, creating it is pretty easy even in SQL.. I feel your pain.. This might be a case where you leave that part in DTS and call it that way.. It sucks but it might be the best way..

    CEWII

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply