December 19, 2010 at 6:42 am
A typical ETL scenario is uploading data into a database from a single text file. During uploading data cleaning and pre-processing is necessary. The text file contains less than 100,000 rows.
The text processor creates a temporary table and dumps the text file as-is. The next step would be inserting the data into approx 10 tables; whereby each table has a surrogate ID.
My question is - is there an efficient was to populate each table. Since the database normalized the traditional approach would be:
1) insert row into table A and get the newly inserted identity PK.
2) insert row into table B and get the newly inserted identity PK.
3) insert row into table C with keys in 1 and 2.
4) and so forth..
Even though cursors were avoided, this row by row operation is highly inefficient .Are there any solutions or recommendations to address this problem?
Thanks
December 20, 2010 at 2:08 am
This was removed by the editor as SPAM
December 20, 2010 at 5:35 am
Thanks for your feedback stewartc-708166.
I believe that the problem with the suggested approach is that there is no means how to relate TableX with #Temp.
With the aim to explain my self clearer, I will examplify my problem.
TempTable - with a suggorate ID
TempTable will be truncated after the import operation completes.
PK ColA1 ColA2 ColA3 ColA4 ColA5 ColB1 ColB2 ColB3 ColB4 ColC1 ColC2 ColC3
Table A
PK Col1 Col2 Col3 Col4 Col5
Table B
PK Col1 Col2 Col3 Col4 Col5
Table C
PK Col1 Col2 Col3 Col4 ColFKtoTableA ColFKtoTableB
Columns TempTable - ColA1,ColA2,ColA3, ColA4 map to TableA. These fields are processed and could be the result of a lookup search, string manipulation, or other type of processing.
Columns TempTable - ColB1,ColB2,ColB3, ColB4 map to TableB
Columns TempTable - ColC1,ColC2,ColC3,ColC4 map to TableC
Alternatively, it would be interesting to lean how to implement the above scenario with SSIS.
December 20, 2010 at 5:45 am
if you have a PK in your Temp table, why not just use that PK as your identifier in all your subsequent tables?
--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
December 20, 2010 at 6:34 am
Prefer not to include an association between tables and the temp tables.
Would like to avoid changes in schema plus the temp table is just a temporary persisted table. This table is truncated after a successful run. I am aware that the truncate operation preserves identity keys.
December 20, 2010 at 6:37 am
it's not an association to the temp table. Just use the temp table's key value as your key value in the destination tables. Instead of the PK in the destination being an identity just use the Temp Table's key. If you don't want to change the field from an identity you can perform an identity insert. If, however, you have values in those tables that you cannot change because this is production, well then this idea is not a good one.
--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
December 20, 2010 at 6:40 am
Thanks Grasshopper for the input. Indeed the system is already in production. Hence would need to opt for an alternative solution...
December 20, 2010 at 6:51 am
This was removed by the editor as SPAM
December 20, 2010 at 6:59 am
yes my mistake. Truncate vs delete..Guess you are right 🙂
Thanks
December 20, 2010 at 7:06 am
This was removed by the editor as SPAM
December 21, 2010 at 3:30 am
I believe that this solutions will work out well.
It is assumed that the columns listed in the predicate are unqiue.
ie.
[Table A].Col1 = ColA1
AND [Table A].Col2 = ColA2
AND [Table A].Col3 = ColA3
AND [Table A].Col4 = ColA4
AND [Table A].Col5 = ColA5
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply