SSIS Data Flow vs SQL Stored Procedure Loop

Question

Post reply

SSIS Data Flow vs SQL Stored Procedure Loop

ShuaibV

SSCrazy

Points: 2001
More actions
May 9, 2017 at 11:42 am

#325379

Hi
i have a stage table with lots of records and need to move to the final table. When moving need to perform insert/update based on primary key value. Have to move record one by one for tracking error in case of any issues. i am trying to determine which method is faster in terms of performance , WHILE loop in SQL or SSIS Data flow task? Can someone provide the input?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Alan Burstein SSC Guru Points: 61141 More actions · Answer 1

When moving need to perform insert/update based on primary key value.

This sounds like a job for MERGE to me; there's no need for a loop or SSIS for that.

have to move record one by one for tracking error in case of any issues.

In other words, you need to analyze your table row by agonizing row. Why? Why not analyze them all at once? That's something SQL Server is very good at doing.

i am trying to determine which method is faster in terms of performance , WHILE loop in SQL or SSIS Data flow task? Can someone provide the input?

Again, you don't need a loop or SSIS for this. The ideal solution will likely be a MERGE statement that performs the INSERTS & UPDATES based on the key column. I would make sure that there's a clustered index on that column.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

SSIS is better handling errors as you can send the problematic rows to a different destination.
T-SQL can be faster and easier to write. But, as mentioned by Alan, it should be done in sets instead of going row by row.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

ShuaibV SSCrazy Points: 2001 More actions · Answer 3

I wont be able to perform as set because i have to keep track errors for each row (expected errors likely be Foreign Key errors). Will SSIS perform better through Data Flow task or Stored Procedure loop method?

frederico_fonseca SSCoach Points: 16155 More actions · Answer 4

ShuaibV - Tuesday, May 9, 2017 1:08 PM
I wont be able to perform as set because i have to keep track errors for each row (expected errors likely be Foreign Key errors). Will SSIS perform better through Data Flow task or Stored Procedure loop method?

Then create another table to hold the "errors".
process the input data once to flag all records that have errors and output those onto the errors table

After validating all data process the valid ones onto their final destination, and those on the errors table you ignore and report
No need for any loop and you still report each row with errors.

One way I did this previously was to have a table with the same layout as the input one, plus 1 column for each possible error

ShuaibV SSCrazy Points: 2001 More actions · Answer 5

The source and destination tables are same structure. Source does not have refrential integrity key defined. If i understand correctly, you are suggesting to use Data Flow task to process the data and re-direct the error rows to Errors table, correct?

frederico_fonseca SSCoach Points: 16155 More actions · Answer 6

Lets assume the following example
Standard process would be
- dataflow -
Input file to Final table

Approach I am suggesting is
- dataflow -
Input file to Staging table
- execute sql -
validates data, inserts bad records onto error table - set based operation
  insert into errortable
  select ...
    , case when fk_table1.key is null then 'Y' else null end as fkey1error
    , case when fk_table2.key is null then 'Y' else null end as fkey2error
  from stagingtable
  left outer join fk_table1
   on ...
  left outer join fk_table2
   on ...
  where fk_table1.key is null
   or fk_table2.key is null
   ...
- execute Sql or dataflow - if Sql then it may well be on the previous step
staging table to final table - excluding bad records
  select
  from stagingtable
  left outer join errortable
   on ...
   where errortable.entry is null

if the input data does not have a unique key value you can define the staging table with an extra identity field so it can be used to join to the errortable