Flat File Data Conversion in SSIS package

  • Hi All,

    I have an SSIS package that connects to a csv file on the network.  I run this through the Derived Column Editor to trim up certain data elements, then through the Data Conversion Editor to convert specific elements from string[DT_STR] to a Float. 

    I also need to check one data element of the csv file to insure it is numeric and if not, strip out the unwanted characters and keep the numeric value.

    Unfortunately I am having no success at either task -

    Anyone have any ideas?

  • what is the pattern of the characters i.e.

    12GGG23 or

    GGG151516H or

    11111JJJJJ or ...


    Everything you can imagine is real.

  • First, Thank you for responding 

     

    The data in the csv file looks something like this:

    Date, Company Name, Account, Invoice No, Invoice Amt

    1/1/2005, "Company, Inc.", 12345, 12345678 RI, 1234.32

    1/2/2005, "Company, Inc.", 12345, SX12345678 RI, 3534.99

    The company name causes issues because the column delimiter is a comma. I also need to extract the numeric portion of the Invoice Number.

  • Hi,

    first of all - use TAB as seperator in flat files, most systems do not allow TABs in data fields, thus this causes not as much problems as comma. But maybe you have no influence on this.

    Normally if you have your text fields in this form "text" the use of " as text qualifier in the data connection properties of the flat file should prevent the problems you described.

    The extraction of the numberic portion sounds difficult, never did something like this. Maybe try a fuzzy logic task? It is not elegant, but if you get no other solution you could use derived column tasks using a REPLACE to replace the letters with an empty string.

    Best regards,

    Stefan


    Kindest Regards,

    SK

  • Selecting TAB as the seperator returns only one column from the file.  Additonally, all data points in the text file row may be empty until you get to the last two points (Invoice No, Invoice Amt), so the text file looks like this:

    ,,,,,,,,,1234 RI, 123.32

     

     

  • Hi James,

    of course I meant that you have to use TAB when exporting the data from your source system. It makes no sense to use comma as column seperator when exporting the data and using TAB as seperator when importing. But maybe you cannot deceide which column seperator to use for the export.

    Best regards,

    Stefan


    Kindest Regards,

    SK

  • The file is generated by an outside source and imported to our system so I have very little to do with the format of the file (comma vs tab).  Getting the impression I need to use a vb.net script to extract the data.

  • James,

    Make sure you identify the quote " as your field qualifier when defining the text file source.  That will signal the parser that anything within quotes is an atomic value even if it contains your delimiter.

    hth

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • James -

    Your original example had double-quotes (") around the company name.  If that's consistant then setting that as your text qualifier will handle things okay.  The only time this will be a problem is when a double-quote is used within the data, such as 36".

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply