May 9, 2017 at 11:42 am
Hi
i have a stage table with lots of records and need to move to the final table. When moving need to perform insert/update based on primary key value. Have to move record one by one for tracking error in case of any issues. i am trying to determine which method is faster in terms of performance , WHILE loop in SQL or SSIS Data flow task? Can someone provide the input?
May 9, 2017 at 12:27 pm
When moving need to perform insert/update based on primary key value.
This sounds like a job for MERGE to me; there's no need for a loop or SSIS for that.
have to move record one by one for tracking error in case of any issues.
In other words, you need to analyze your table row by agonizing row. Why? Why not analyze them all at once? That's something SQL Server is very good at doing.
i am trying to determine which method is faster in terms of performance , WHILE loop in SQL or SSIS Data flow task? Can someone provide the input?
Again, you don't need a loop or SSIS for this. The ideal solution will likely be a MERGE statement that performs the INSERTS & UPDATES based on the key column. I would make sure that there's a clustered index on that column.
-- Itzik Ben-Gan 2001
May 9, 2017 at 1:01 pm
SSIS is better handling errors as you can send the problematic rows to a different destination.
T-SQL can be faster and easier to write. But, as mentioned by Alan, it should be done in sets instead of going row by row.
May 9, 2017 at 1:08 pm
I wont be able to perform as set because i have to keep track errors for each row (expected errors likely be Foreign Key errors). Will SSIS perform better through Data Flow task or Stored Procedure loop method?
May 9, 2017 at 2:53 pm
ShuaibV - Tuesday, May 9, 2017 1:08 PMI wont be able to perform as set because i have to keep track errors for each row (expected errors likely be Foreign Key errors). Will SSIS perform better through Data Flow task or Stored Procedure loop method?
Then create another table to hold the "errors".
process the input data once to flag all records that have errors and output those onto the errors table
After validating all data process the valid ones onto their final destination, and those on the errors table you ignore and report
No need for any loop and you still report each row with errors.
One way I did this previously was to have a table with the same layout as the input one, plus 1 column for each possible error
May 9, 2017 at 3:08 pm
The source and destination tables are same structure. Source does not have refrential integrity key defined. If i understand correctly, you are suggesting to use Data Flow task to process the data and re-direct the error rows to Errors table, correct?
May 9, 2017 at 4:14 pm
Lets assume the following example
Standard process would be
- dataflow -
Input file to Final table
Approach I am suggesting is
- dataflow -
Input file to Staging table
- execute sql -
validates data, inserts bad records onto error table - set based operation
insert into errortable
select ...
, case when fk_table1.key is null then 'Y' else null end as fkey1error
, case when fk_table2.key is null then 'Y' else null end as fkey2error
from stagingtable
left outer join fk_table1
on ...
left outer join fk_table2
on ...
where fk_table1.key is null
or fk_table2.key is null
...
- execute Sql or dataflow - if Sql then it may well be on the previous step
staging table to final table - excluding bad records
select
from stagingtable
left outer join errortable
on ...
where errortable.entry is null
if the input data does not have a unique key value you can define the staging table with an extra identity field so it can be used to join to the errortable
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply