July 11, 2010 at 9:32 pm
Hi Team,
I have around 1GB tab delimited file that contains around 100 Million records, one of them column is string data type and other columns are numeric. I need to load this file into SQL Server and i need to convert the string datatype to numeric (by extracing numerics only from the string value, for ex: Input= Engineer41 , output=41).
I am using SSIS and thinking of using BULK insert for fast processing, but here the problem is one of the input column is String and should be convert to INT by extracting numerics. how can we do this in SSIS?
Even if i direct load .TAB file and then updating the table in SQL is how much feasible ? I think i need two tables in SQL Server for this solution. One table is first to load as it is TAB file then another table that contains all numeric columns. Then copy all the Table1 data to Table2 thorough Numeric extraction from string column in Table1.
Please advice me a better solution .
July 12, 2010 at 10:19 am
Another option would be to use the Data Flow Script Component. With this component you could employ regular expressions to extract and transform the numeric data and load it directly into your "numeric" table.
I don't know if this is a better option versus the BULK INSERT and SQL UPDATE you mentioned - you would have to test each method to know. My preference would probably be to go with the Script Component on the assumption that extracting the numeric data and inserting new rows into a second table would not be a very efficient process in SQL.
July 12, 2010 at 10:38 am
Ed Zann (7/12/2010)
Another option would be to use the Data Flow Script Component. With this component you could employ regular expressions to extract and transform the numeric data and load it directly into your "numeric" table.I don't know if this is a better option versus the BULK INSERT and SQL UPDATE you mentioned - you would have to test each method to know. My preference would probably be to go with the Script Component on the assumption that extracting the numeric data and inserting new rows into a second table would not be a very efficient process in SQL.
A Derived Column would probably be noticeably faster than a Script Component for this fairly straightforward task ... Probably no need for regex either - surely it's not that complicated?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
July 13, 2010 at 3:18 am
Be aware of the QOTD from the 5th July
http://www.sqlservercentral.com/Forums/Topic947263-1694-1.aspx
-------------------------------------------------------------------------
Normal chaos will be resumed as soon as possible. :crazy:
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply