November 7, 2017 at 12:00 am
Comments posted to this topic are about the item Importing Files in Parallel with SSIS
November 7, 2017 at 6:21 am
Great article!
One question: How do you decide how many instances are required?
November 7, 2017 at 6:35 am
Hi Stan
It would depend on how many threads can run on the processors your server has, but you'd need to test it out on your individual environment.
The purpose is to make higher use of the CPU.
There is a point at which too many threads will start to slow the process down so you just need to try different amounts until you find the right number.
Hope this helps
November 7, 2017 at 9:01 am
IS there any way you can upload the solution or any link where we can download this? Have some confusion how to call Package solution in for each loop
November 7, 2017 at 10:07 am
Hi Mohammad
Here's a zip of the solution.
The package that you call is in the same solution as the package containing the script task with all the code as per the article.
November 7, 2017 at 12:08 pm
I posted a very similar solution over a year ago on http://www.mssqltips.com here. Not accusing you of anything but thought my solution would add some value. I have some performance data that really shows how powerful this method can be. Enjoy! Keith
November 7, 2017 at 12:55 pm
kgresham - Tuesday, November 7, 2017 12:08 PMI posted a very similar solution over a year ago on http://www.mssqltips.com here. Not accusing you of anything but thought my solution would add some value. I have some performance data that really shows how powerful this method can be. Enjoy! Keith
Hi Keith, thanks for the feedback.
Had a look at your article and your performance data is a great example of the gains that can be made.
This was something that I started playing around with a while back and decided to revisit again recently, but in case I maybe came across your article in the past and it seeded the idea in my head, then I must apologize for not making any reference to it.
Paulo
November 9, 2017 at 1:58 am
Great article Paulo.
I follow the steps but I keep getting this error 'The object in the variable "User::FileList" does not contain an enumerator'
Any reason for this error?
November 9, 2017 at 2:12 am
Thanks Jowee
I think you might be trying to execute "TestPackage" by itself.
Make sure you set "ParallelFileProcessor" as the startup object for the solution.
It will pass the FileList object to the "TestPackage" package.
November 9, 2017 at 3:27 pm
Awesome!!! Could have used this several months ago 🙂
November 10, 2017 at 1:50 am
@amartinez
It's never too late to go back and rework existing processes 🙂
November 10, 2017 at 1:55 pm
Great article. Thank you for sharing your approach.
Can you explain the acrobatics done here. I don't understand the reason for using tempList variable
foreach (string file in Directory.GetFiles(sourceDirectory, fileMask))
{
fileCount += 1;
fileGroup = fileCount % noOfPackageInstances;
List<string> tempList = new List<string>();
if (allFiles[fileGroup] != null) { tempList = allFiles[fileGroup]; }
tempList.Add(file);
allFiles[fileGroup] = tempList;
}
November 13, 2017 at 2:11 am
Thanks Misha
Like I mentioned in the article, my code is probably not the most elegant 🙂
The reason I'm using it is because I can't add to a null array element directly.
There must be a better way of doing this, so if you have any suggestions then please post them here.
August 23, 2019 at 9:03 am
We do something similar to import 160 text files (and we get these every day). They are pipe delimited but have a .CSV extension. Each file has a different layout but is consistent every day (so file 1 is different to file 2 but file 1 will have the same layout every day).
We create an XML file with the Bulk Import details needed in there (the XML file is created using a script take.
Our process looks like:-
Load a Manifest file (this contains a list of all the files that have been downloaded)
There are 3 extra columns in here:-
IsDownloaded (we set that to 1 if it's a file ready for import)
IsLoading (a file in the midle of being processed)
IsLoaded (a file that has been processed). This also leaves the IsLoading set to 1
We then load the list of files into an object in SSIS (just doing a "SELECT FileName FROM Manifest WHERE IsDownloaded = 1)
We then loop through the object and collect the next file where IsDownloaded = 1 and IsLoading = 0
We then call the load process once for each file (it's the same SSIS package called up to 32 times with a paramater that specifies a filename).
We can load all 160 files (around 500MB) in about 5 minutes.
The loop just carries on calling the actual load package untill all the files are loaded.
August 23, 2019 at 10:42 am
If you really have massive numbers of files, I'd recommend going with CozyRoc's Paralell Execution Task step instead.
https://www.cozyroc.com/ssis/parallel-loop-task
Instead of having to spawn multiple independent tasks that all run separately in SSIS (which I've done in the past) and can individually fail and just cause a mess with management. You define a disabled portion of your SSIS package as the part you want to run in parallel and tell it how many times you want it to run in parallel, and it does all the work. My craziest package processes close to 30,000 files in a given day and does it where I don't have to manage up to 30,000 spawned SSIS package executions. Well worth the couple hundred bucks it takes to license the components.
Viewing 15 posts - 1 through 15 (of 30 total)
You must be logged in to reply to this topic. Login to reply