April 9, 2008 at 1:29 pm
Does this sound right to you guys?
I tested a portion of my package that has approximately 10 million + records. I have to pull 9 columns from 2 tables.
I did a single OLEDB source which points to a ConditionalSplit. The split tests for LastName is NULL or Blank verses LastName is Not NULL and Not Blank. The CS leads into a Multicast for the NULLs/Blanks and a UNION for the Not NULLs. The MULTICAST sends the NULL/Blanks to a Flat File destination (for later emailing) and the above UNION. The UNION leads to the single OLEDB Destination.
The above setup takes 1 hour 18 minutes to run.
So I tested an alternative. I set up 2 OLEDB sources which, in the queries, do the NULL/Blank check. The Not Nulls send directly to a destination OLEDB. The NULLS go from the second source to Multicast which again splits between a Flat File Destination and a second OLEDB destination. The two execution trees are separate and the NULL/Blanks execute by themselves with the Not NULLs are still running.
This version runs 1 hour 17 minutes.... HUH?
Am I crazy? Shouldn't there be more of a performance difference between the two since I'm giving the second version additional threads to run against?
Or am I missing something obvious? Like maybe my server just can't handle the load so it doesn't matter how I do this?
BTW, I'm running the actual package on my local machine. Would this make a difference in my performance tests?
As always, any thoughts or comments appreciated. Thanks,
April 9, 2008 at 9:45 pm
Brandie,
If you are running the package on your workstation, the SSIS processing will happen on that machine (in most cases) instead of the server itself. At first glance, I would say that your key bottlenecks are a) transferring 10M rows of data to your workstation and b) the SSIS processing on your workstation rather than on the server. An hour to process what you've described seems like a long time even for 10M records, but it's possible depending on your network and hardware.
If you can, deploy the package to your SQL server and run it from there. I'll bet you will see significant performance improvement.
hth,
Tim
Tim Mitchell
TimMitchell.net | @tmitch.net | Tyleris.com
ETL Best Practices
April 11, 2008 at 8:41 am
Ah, well that would explain a lot.
Right now I'm still in Dev & Debug mode. But I guess it wouldn't hurt to disable everything but the container I'm currently working with and try executing it from the server.
Thanks, Tim, for the idea. I appreciate it.
April 11, 2008 at 2:18 pm
Just curious, but is there any other processing done to the records besides splitting the no-lastname ones to a flat file? If not, my inclination would be to have one source for all rows, then multicast with one output (all rows) going to the destination and the other going to a conditional split to isolate the blank/nulls into your flat file. This would eliminate the Union.
April 14, 2008 at 5:24 am
John,
Actually, there wasn't much processing going on after the Split. And over the weekend, I just discovered what you're referring to (even though I didn't read your post until this morning).
If I have 1 source, but do multiple destinations (with two splitting off from the MULTICAST) and forgo the UNION entirely, the package does run faster by almost 10-15 minutes.
After the Conditional Split, I added in an Error output, a Default output (both of the former to see how many records I would get in those paths), a NotNULLName output and a NULLName output with the MULTICAST at the end of it for the flat file and the OLE DB destination...
Hmm. This lack of UNION thing bears some consideration. I might have to change some of the other steps of my package to take advantage of it.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply