June 28, 2007 at 10:38 am
Hi All,
I have an SSIS package that connects to a csv file on the network. I run this through the Derived Column Editor to trim up certain data elements, then through the Data Conversion Editor to convert specific elements from string[DT_STR] to a Float.
I also need to check one data element of the csv file to insure it is numeric and if not, strip out the unwanted characters and keep the numeric value.
Unfortunately I am having no success at either task -
Anyone have any ideas?
June 28, 2007 at 4:17 pm
June 28, 2007 at 8:22 pm
First, Thank you for responding
The data in the csv file looks something like this:
Date, Company Name, Account, Invoice No, Invoice Amt
1/1/2005, "Company, Inc.", 12345, 12345678 RI, 1234.32
1/2/2005, "Company, Inc.", 12345, SX12345678 RI, 3534.99
The company name causes issues because the column delimiter is a comma. I also need to extract the numeric portion of the Invoice Number.
June 29, 2007 at 1:44 am
Hi,
first of all - use TAB as seperator in flat files, most systems do not allow TABs in data fields, thus this causes not as much problems as comma. But maybe you have no influence on this.
Normally if you have your text fields in this form "text" the use of " as text qualifier in the data connection properties of the flat file should prevent the problems you described.
The extraction of the numberic portion sounds difficult, never did something like this. Maybe try a fuzzy logic task? It is not elegant, but if you get no other solution you could use derived column tasks using a REPLACE to replace the letters with an empty string.
Best regards,
Stefan
SK
June 30, 2007 at 4:14 am
Selecting TAB as the seperator returns only one column from the file. Additonally, all data points in the text file row may be empty until you get to the last two points (Invoice No, Invoice Amt), so the text file looks like this:
,,,,,,,,,1234 RI, 123.32
July 2, 2007 at 2:06 am
Hi James,
of course I meant that you have to use TAB when exporting the data from your source system. It makes no sense to use comma as column seperator when exporting the data and using TAB as seperator when importing. But maybe you cannot deceide which column seperator to use for the export.
Best regards,
Stefan
SK
July 2, 2007 at 4:45 am
The file is generated by an outside source and imported to our system so I have very little to do with the format of the file (comma vs tab). Getting the impression I need to use a vb.net script to extract the data.
July 3, 2007 at 7:43 am
James,
Make sure you identify the quote " as your field qualifier when defining the text file source. That will signal the parser that anything within quotes is an atomic value even if it contains your delimiter.
hth
Tim
Tim Mitchell, Microsoft Data Platform MVP
Data Warehouse and ETL Consultant
TimMitchell.net | @Tim_Mitchell | Tyleris.com
ETL Best Practices
July 18, 2007 at 9:23 am
James -
Your original example had double-quotes (") around the company name. If that's consistant then setting that as your text qualifier will handle things okay. The only time this will be a problem is when a double-quote is used within the data, such as 36".
January 7, 2024 at 8:54 am
This was removed by the editor as SPAM
January 7, 2024 at 8:55 am
This was removed by the editor as SPAM
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply