Testing ConditionalSplit vs Server-Side WHERE clause

Question

Testing ConditionalSplit vs Server-Side WHERE clause

Brandie Tarvin

SSC Guru

Points: 173080
More actions
April 9, 2008 at 1:29 pm

#121587

Does this sound right to you guys?
I tested a portion of my package that has approximately 10 million + records. I have to pull 9 columns from 2 tables.
I did a single OLEDB source which points to a ConditionalSplit. The split tests for LastName is NULL or Blank verses LastName is Not NULL and Not Blank. The CS leads into a Multicast for the NULLs/Blanks and a UNION for the Not NULLs. The MULTICAST sends the NULL/Blanks to a Flat File destination (for later emailing) and the above UNION. The UNION leads to the single OLEDB Destination.
The above setup takes 1 hour 18 minutes to run.
So I tested an alternative. I set up 2 OLEDB sources which, in the queries, do the NULL/Blank check. The Not Nulls send directly to a destination OLEDB. The NULLS go from the second source to Multicast which again splits between a Flat File Destination and a second OLEDB destination. The two execution trees are separate and the NULL/Blanks execute by themselves with the Not NULLs are still running.
This version runs 1 hour 17 minutes.... HUH?
Am I crazy? Shouldn't there be more of a performance difference between the two since I'm giving the second version additional threads to run against?
Or am I missing something obvious? Like maybe my server just can't handle the load so it doesn't matter how I do this?
BTW, I'm running the actual package on my local machine. Would this make a difference in my performance tests?
As always, any thoughts or comments appreciated. Thanks,
Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Tim Mitchell SSCoach Points: 15685 More actions · Answer 1

Brandie,

If you are running the package on your workstation, the SSIS processing will happen on that machine (in most cases) instead of the server itself. At first glance, I would say that your key bottlenecks are a) transferring 10M rows of data to your workstation and b) the SSIS processing on your workstation rather than on the server. An hour to process what you've described seems like a long time even for 10M records, but it's possible depending on your network and hardware.

If you can, deploy the package to your SQL server and run it from there. I'll bet you will see significant performance improvement.

hth,

Tim

Tim Mitchell
TimMitchell.net | @tmitch.net | Tyleris.com
ETL Best Practices

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 2

Ah, well that would explain a lot.

Right now I'm still in Dev & Debug mode. But I guess it wouldn't hurt to disable everything but the container I'm currently working with and try executing it from the server.

Thanks, Tim, for the idea. I appreciate it.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

john.arnott SSChampion Points: 11882 More actions · Answer 3

Just curious, but is there any other processing done to the records besides splitting the no-lastname ones to a flat file? If not, my inclination would be to have one source for all rows, then multicast with one output (all rows) going to the destination and the other going to a conditional split to isolate the blank/nulls into your flat file. This would eliminate the Union.

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 4

John,

Actually, there wasn't much processing going on after the Split. And over the weekend, I just discovered what you're referring to (even though I didn't read your post until this morning).

If I have 1 source, but do multiple destinations (with two splitting off from the MULTICAST) and forgo the UNION entirely, the package does run faster by almost 10-15 minutes.

After the Conditional Split, I added in an Error output, a Default output (both of the former to see how many records I would get in those paths), a NotNULLName output and a NULLName output with the MULTICAST at the end of it for the flat file and the OLE DB destination...

Hmm. This lack of UNION thing bears some consideration. I might have to change some of the other steps of my package to take advantage of it.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.