Parallelizing Slow Parts of the Data Flow - Part 1 - Preparation

It's quite common to have parts of your Data Flow that are slow, and there are some techniques you can use to improve performance of those parts. One of them is to try to parallelize the slow operation - but that only works for operations that are parallelizable. Sort isn't one of them - sorry to say. And you really have to know how the data flow works to make sure you don't try fixing a problem that looks "serial" but is, in fact parallel - you'll only make things worse.
Identifying Parallelization Opportunities
So what scenarios does parallelization work for?
Positive Signs
Here are some characteristics that should identify a good candidate for parallelizing a portion of your data flow.

The operation applies independently to single rows, or to an easily identifiable subset of rows.
There's a known-slow resource in use that allows concurrent use (SQL Server, disk, web, ...)

Counter Indications
If your data flow has any of the following characteristics, it's not likely that parallelizing (alone) will help speed anything up.

The server's CPU is at or near capacity during the slow section of the data flow.
Any external resources (SQL Server, file system, network) the slow section of the data flow uses are already at or near capacity.
Some part of the slow section requires all rows at once - an aggregate or sort, for example.

Test, Backup, Change, Test, Backup, Change, ...
After applying a parallelization technique, you'll need to figure out if it's better, right? Then the first thing you have to do is measure the performance of your current design (obviously!). Make sure you measure a few things - time to completion, CPU, RAM, network, and disk utilizations. Run the test at least three times in a row to eliminate caching effects and get a good average.
We measure all those attributes so that we can see what we're trading speed for. The idea is to trade CPU utilization for speed - but we could have to pay RAM and I/O as well - to a degree you might not be comfortable with. That's why we keep an eye on those metrics.
Before you make any changes... make sure you back up your original! Your quest to improve performance using one technique might work somewhat, and you may want to try another technique. If you have a backup of your original package, it'll (probably) be easier to implement the second technique from there, while ensuring your actual logic remains consistent. Do also back up your attempts - you might try a few techniques but decide the first one worked "best".
Make your change to parallelize the package, then double-check the result is the same as your original. It doesn't do any good to make your data go faster if you ruin it in the process.
Be Prepared For Disappointment
Even if your slow section seems to fit the profile I outlined, there's no guarantee applying a parallelization technique will help. You may not be able to squeeze any more time out of the process due to some limiting factor you weren't aware of when you started. Or the trade of speed for some other resource just isn't something you're ready to make.
What follows will be posts on using the Balanced Data Distributor, then a "poor man's" BDD using existing components for those of you that can't install additional software (even when published by Microsoft).

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Parallelizing Slow Parts of the Data Flow - Part 1 - Preparation

Rate

Share

Share

Rate

Parallelizing Slow Parts of the Data Flow - Part 1 - Preparation

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts