Parallel SSIS processing

Question

Parallel SSIS processing

rocky_498

SSCertifiable

Points: 6532
More actions
August 15, 2018 at 8:04 pm

#415267

Hi,
I need advice. I have almost 100 files into Network drive. If I process 100 files through "Foreachloop container" it is taking almost 2m to process one file so it is taking almost 200m to process 100 files (as an example). I was thinking to do Parallel processing in SSIS. Please advise what is the best and fastest way to handle this scenario. If I pick 10 processes at the same time then how can pass 10 files name to 10 processes. All files have the same file structure.
Please advice.
Thank You.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246992 More actions · Answer 1

rocky_498 - Wednesday, August 15, 2018 8:04 PM
Hi,
I need advice. I have almost 100 files into Network drive. If I process 100 files through "Foreachloop container" it is taking almost 2m to process one file so it is taking almost 200m to process 100 files (as an example). I was thinking to do Parallel processing in SSIS. Please advise what is the best and fastest way to handle this scenario. If I pick 10 processes at the same time then how can pass 10 files name to 10 processes. All files have the same file structure.
Please advice.
Thank You.

Are all of the files being loaded into the same target table? If so, going parallel is unlikely to help much, as you'll face a lot of blocking.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

rocky_498 SSCertifiable Points: 6532 More actions · Answer 2

Yes, all are going to the same table but the different batch.
Any other better advice to accomplish this?

Phil Parkin SSC Guru Points: 246992 More actions · Answer 3

rocky_498 - Thursday, August 16, 2018 8:16 AM
Yes, all are going to the same table but the different batch.
Any other better advice to accomplish this?

You could try introducing some parallelism & see whether it speeds things up.

Test 1
Add another FEL to your package to execute in parallel with the existing one and split the files 50/50 between the two.

Test 2
As above, but have each FEL pump data into separate staging tables and run an ExecuteSQL task at the end of the load to bring the data together. This avoids the locking issue I referred to previously.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

rocky_498 SSCertifiable Points: 6532 More actions · Answer 4

Thanks let me test it.
However, I am stuck with, how to pass file names to all 10 child packages.

What I did so far, loop through all files names and load into table.
Should I create 10 variables and pass the variables to all 10 child packages. It could possible sometimes there are only 5 files into the folder or sometimes more than 10.

Thank You.

Phil Parkin SSC Guru Points: 246992 More actions · Answer 5

rocky_498 - Thursday, August 16, 2018 10:04 AM
Thanks let me test it.
However, I am stuck with, how to pass file names to all 10 child packages.
What I did so far, loop through all files names and load into table.
Should I create 10 variables and pass the variables to all 10 child packages. It could possible sometimes there are only 5 files into the folder or sometimes more than 10.
Thank You.

You made no mention of child packages earlier: what do they do?
File names can be loaded into a table or an object variable using xp_Dirtree, no looping is required. There are many ways to split this list into two chunks (eg, add an IDENTITY column and have dataflow A process even rows and dataflow B process odd rows).

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel