Problems importing CSV files using SSIS.

  • I have received some CSV files that I need to simply import into SQL 2005 tables. These files open cleanly in Excel 2007 and import cleanly into Access 2007. But for some reason a simple import into SQL Server table using SSIS Import/Export Wizard and flat-file source doesn't place the data in the correct columns. I need to automate this for nightly updates.

    I have tried using just the defaults and then setting " as a text delimiter, but the import fails.

    I had a similar project a couple years ago when I ended up using Bulk Insert with Format files that I created. But SSIS should make life easier...correct?

    Here is an example of where the import fails. I have specified that columns are comma (,) delimited and the text is delimited with double-quote ("). There are cases where a comma is included within a text or character column (not to be confused with the TEXT datatype).

    "This is the first column","2nd column<font face=""Courier New, Courier"">","3rd column",143,...........

    It is clear to Excel and Access, simple minded desktop applications, that the comma between "Courier New, Courier" is part of a text string. But the wise and powerful SSIS decides to create a column break there. Also it should be obvious that "" within a text string is an escaped double-quotation mark pair. It drives me crazy that Excel and Access by default handles this quite well while SSIS chokes. Please somebody tell me that I won't have to create format files to explain to the mighty SQL Server that "," is the end of the 1st column and "," the end of the 2nd and maybe ", is the end of the 3rd if the following column contains a number.

    Well I suppose if I must go back to basics and create format files for 15 CSV files with up to 70 columns each, then so be it. But I must say that if that is the case...then where is the money being spent in Redmond? They are starting to look more and more like Government efficiency. LOL

    Don't get me wrong...I'm all for powerful tools. But if they come at the cost of creating more work than necessary, then what is the point? Maybe I'm just too simple minded for complex tools. Hey is that the real Wizard (OZ) I see behind the curtain?

    So much for the rant. Thanks for any help. I know there must be an answer.

  • Well I see that I am not the only one with this problem. It seems that the Flat File import has problems with commas imbedded into a text field delimited with ". See this post...

    http://www.sqlservercentral.com/Forums/Topic1026801-148-1.aspx

    Apparently there are more urgent issues to be dealt with regarding SSIS. Interesting, I tried to create a link with the HTML link button in the toolbar but the pseudo-code is being displayed as below at least in the preview pane.

    If anyone has solved this problem of SSIS not being able to deal with a comma within a CSV text field, please let me know.

    Thanks

  • Well it looks like I am answering my own question. Even SSIS in SQL Server 2008 with SP1 is just not capable of a simple import from a CSV file, one of the most common and transportable data formats in the world!

    I tried using SQL Server 2000 DTS for the CSV Import and it worked perfectly. It even recognized the double-quote text delimiters without any help from me. I should be able to create a DTS package for all the files and then move it to SQL2005 using "backward" compatibility tools (I hope).

    Imagine that...11 years later SQL Server 2008 with Service Pack 1 can’t even perform the most rudimentary import. That's simply shameful Microsoft.

  • I ran into this same issue with SSIS but after lots of complaining amongst my team, having several experienced SSIS developers "take a shot" at the problem, as well as posting on various forums about the issue...I looked honestly at the source file format and it was true, my CSV file was not consistently formed. SSIS is less tolerant than DTS in this respect. There is also a bug in DTS when processing CSV files in Unicode format so watch out for that.

    Chances are, if DTS can handle it, you still should not use it πŸ™‚

    To be clear, SSIS absolutely can process CSV files, however they must be consistently formed.

    Can you go back to the data supplier and have them provide a consistent format? Can you pre-process the file with something like PowerShell or a .NET console app to bring the file into a consistent state?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thank you for your response. The data supplier and the data supplied is the same supplier and data that I have been using for 4 years. They are changing the platform that their data uses (upgrading from an old AS400). They are also improving the quality of the data and changing the headings to be more human readable.

    I originally had the same problem when writing the data import program from the original csv files. I had problems with SSIS in SQL 2005 back then and ended up writing some BCP scripts using format files. That program has been running smoothly every night for over 3 years now and the data running in a production environment quite well. I'm sure I will be able to use the new data imported with SQL Server 2000 DTS. I simply did not think of trying SQL 2000 back then.

    It comes down to a simple but vexing problem, conceptually easy to understand. DTSX chokes on comma delimited files when there is a comma within a string field delimited by double quotes. Plain and simple.

    There is nothing wrong with sending data like the following.

    30298,"4,598.00","Then George said, ""Hey Jim, what are you doing?""", 000897,..........

    SQL Server 2000 DTS handles it quite well as does the BCP utility. The data, as you can see, is quite usable. However, there is no way that I can seem to get SQL 2005 SSIS or even SQL 2008 SSIS to handle it.

    They both choke on "4,598.00" and "Then George said, ""Hey Jim, what are you doing?""", splitting the data at the commas when the specs clearly indicate that the comma is part of a character string and the included escaped double-quotes are properly handled.

    My contention is that SSIS should allow the operator (me) to configure it to use the above data. I don't need it to "protect" me from "bad" data. For political business reasons I cannot push the data supplier too hard. I am more than happy to be getting the data at all. And why should I wrestle with 2 or more steps when for years it has been running fine with a single step import to SQL Server staging tables. Do you see an "easy" way to pre-process out the "good" commas from the "bad" commas? I am taxed with multiple deadlines on multiple projects, and this initial import is a minor part of all that.

    If anyone can show me how to get SSIS to do this simple and valid thing, I will be ecstatic.

    I have been dealing with data and software programming for 20 years since the early DOS and MainFrame days.

    However, thanks again for your suggestion.

  • I have just spent some time looking at this and seems that one of the simplest solutions is to change the column delimiter to something else - I used | and it worked for everything I threw at it.

    If you can get your supplier to do this, great. Otherwise you may like to consider pre-processing the file to rewrite it using a new delimiter.

    There must be some code "out there" somewhere that will do the required parsing for you - and please post back if you do it as I'm sure it would help a lot of people.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Thanks Phil,

    Yes I can see how changing the column delimiter would indeed be the easiest solution. I suppose if Excel, Access and SQL 2000 DTS can handle that, there must be an algorithm that I can use to reformat the data. For now I will probably bite the bullet and spend a day writing format files to use with BCP as I did in the past. The thinking here is that at some point Microsoft will no longer allow backward compatibility with SQL 2000 DTS but will always support direct commands.

    The data suppliers are supplying the data under protest because our customers are requesting it to use in the application that we developed for them. The supplier would prefer to have those customers using THEIR interface to the data.

    I just wish MS hadn't designed a gun that decides when it's OK to unlock the safety, or that requires 2 or 3 steps to do so. I have a mountain lion about to pounce and tear my throat out. By the time I figure out how to release the safety I will be dead meat. I've never heard of an autopilot system that doesn't allow human override. When that time comes...God save us.

    OK, it's time to get back to my world as a software developer that creates business applications to make life easier for people. From Design, to Data, to Implementation, to Testing, to Deployment and finally Support and Feature enhancement. Ah, the life of a Data Specialist must be fun. Diddle with my Data all day long. LOL

    Phil, you seem like an OK guy with lots of life experience. The picture you use as an Avatar looks like me as I deal with this situation.

    I am thinking there must be an algorithm using Regular Expressions that might handle the few different variations of replacing the delimiting commas in the correct locations. You chose a | as the replacement which may be the best choice. I was thinking along the lines of ~~, which I sometimes use when storing a list of short strings in a table column, which after retrieval are Split into an array for use in the ASP.NET application. Hmmm, I'm wondering if SSIS will even allow me to use that character pair. Probably not. LOL

    I will indeed post a solution when I find one.

    Thanks

  • I feel your pain man...there is no official "standard" for CSV files...this is the closest thing out there but it's just an RFC: http://www.rfc-editor.org/rfc/rfc4180.txt

    Your file conforms to section 2, unfortunately the SSIS development team responsible for flat file processing did not adopt this document when developing the CSV file handling piece. That said, onward...

    Even if you change the delimiter to pipe making your example line look like this, SSIS still will not rid your data of the doubled double-quotes.

    30298|"4,598.00"|"Then George said, ""Hey Jim, what are you doing?"""|000897|..........

    You'll have to turn quoted identifiers off and make your data look like this:

    30298|4,598.00|Then George said, "Hey Jim, what are you doing?"|000897|..........

    I am not sure how much .NET experience you have but the Open Source .NET project FileHelpers (www.FileHelpers.com) can handle CSV files such as yours. I used it extensively in early 2005 for a project that straddled the ushering out of SQL 2000 and the ushering in of SQL 2005...when SSIS was half-baked at best and the client had 150 flat file sources to process, none of which we could ask for changes to the data files. I used FileHelpers to transform the data (it supports custom transformations too) to something usable for us to load. A C# Console App using FileHelpers could be written in fairly short order to transform your files from what they are, parsing them as comma-delimited files with embedded text qualifiers, to what SSIS would like thim to be, pipe(or whatever)-delimited files with no embedded dilimiters or text-qualifers.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Awesome! Thanks so much for that lead.

    My import progam is a VB.NET console app that also has a User Interface component option should a human decide to run it manually for any reason. While C# is not my forte I can modify the code if necessary. If it's just a matter of referencing the C# assembly then that's even better.

    Even if you change the delimiter to pipe making your example line look like this, SSIS still will not rid your data of the doubled double-quotes.

    I had planned on dealing with those when moving the data from the staging tables into the production tables. I will be bypassing SSIS altogether...again.

    After looking at the FileHelpers website, it appears to be just what the Doctor ordered.

    You have made my day! I should be out of the woods by Monday and offering my customers the new data import progam shortly afterwards.

    Thanks muchly πŸ™‚

  • I had the same problem in the SSIS 2008 data flow package . I had a CSV (comma delimited) source data with one of the columns data quoted with a double quote( " ). The import worked fine using the SQL 2000 DTS import as it recognized the Text Qualifier value as a Double Quote {"}, but this was not the case when I started to use the SSIS 2008. The Text Qualifier shows <None> on the Flat file connection manager object (Text Qualifier) property. I tried to change the <None> to what I have in the DTS 2000 Double Quote {"}, but it didn't fix the problem. I kept researching on this until I found the simple answer to fix this is to put only (") in the Text Qualifier Property and this worked fine and I got rid of the double quotes in the data and the data imported successfully as it was in the source even with the data which contained comma in between.

    You find below the link where I found my answer from:

    http://www.mssqltips.com/tip.asp?tip=1316

  • asso.al-hamawandi (4/18/2011)


    I had the same problem in the SSIS 2008 data flow package . I had a CSV (comma delimited) source data with one of the columns data quoted with a double quote( " ). The import worked fine using the SQL 2000 DTS import as it recognized the Text Qualifier value as a Double Quote {"}, but this was not the case when I started to use the SSIS 2008. The Text Qualifier shows <None> on the Flat file connection manager object (Text Qualifier) property. I tried to change the <None> to what I have in the DTS 2000 Double Quote {"}, but it didn't fix the problem. I kept researching on this until I found the simple answer to fix this is to put only (") in the Text Qualifier Property and this worked fine and I got rid of the double quotes in the data and the data imported successfully as it was in the source even with the data which contained comma in between.

    You find below the link where I found my answer from:

    http://www.mssqltips.com/tip.asp?tip=1316

    You're not serious right? Did you even read this thread? Try processing the example row in the original post using the method outlined in the article and you'll soon realize the underlying issue.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I just spent a whole weekend on this problem. My CSV file contains text data contained within double quotes, eg "blah",1,1/10/2011,"blah" and numeric and date data. All fields are delimited by a comma as dictated by the CSV file format. If your field contains a comma but is enclosed by double quotes, then you must specify (") in the Text Qualifier box on the Flat Form Connection Manager Editor form....BUT, if you have already experienced problems with the file import, then you must delete the flat file connection and then create a new instance, otherwise no matter what you do, SSIS will not recognise the text qualifier. It works!!

  • John Kotuby (4/13/2011)


    So much for the rant.

    I'd post a rant of my own but you did a pretty good job of it. πŸ˜›

    I'm trying to find it but there used to be a "standard" written a long time ago about true Comma Separate Value files including when and how text identifiers where used. It all boiled down to what we all know and what Excel can do. Things with embedded quotes or delimiters must be encapsulated in quotes and the embedded quotes must be doubled up.

    It's a real shame they did away with the "Jet Drivers" when MS went to 64 bit. I believe you can recover some of the lost functionality by downloading the "Ace" drivers. I don't have a URL for the MS download for those... should take someone only a minute or two of Googling to find it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BUT, if you have already experienced problems with the file import, then you must delete the flat file connection and then create a new instance, otherwise no matter what you do, SSIS will not recognise the text qualifier. It works!!

    Makes me wonder how many times I've furiously deleted and started over something I "know" should work... and then it does! :angry:

  • Thanks for pointing out the need to only use " rather than "" - saved me many hours of frustration

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply