Code Snippet for an SSIS DataFlow Performance Wiretap

I was recently tweaking a few Data Flows, and settled on a very small, but very useful script to help record execution timing inside the flow.

Decomposition Is Time Consuming

You may know that I do recommend the decomposition technique to help diagnose performance problems in SSIS data flows. That's the technique where you start recording execution times using fewer and fewer components in the data flow in order to get a differential execution time for particular components. Even though it's not perfect, it does account for misleading effects of the pipeline backpressure mechanism. (What I suggest later does not.)

Unfortunately, performing a decomposition analysis can take quite a while. It is time consuming to replace the destination component with a row count, run the flow, record the results, and continue up the flow.
Adding a Performance Wiretap is Easy
Especially when the code is completely copy-and-paste! I couldn't measure any performance impact of including this script code, and didn't expect any. It doesn't use any columns in the data flow, and the operations it does do are very lightweight. You don't even have to change anything in the script itself - the events that are logged include the name you give the component in the data flow (which must be unique) so you can easily identify which of several wiretaps is reporting information.
public class ScriptMain : UserComponent { #region Statistic collectors private DateTime _preExecuteTime; private DateTime _firstRowTime; private DateTime _lastRowTime; private int _numRows; #endregion

#region PreExecute // This method is called before any rows are read from any sources // We reset all statistics here public override void PreExecute() { base.PreExecute(); this._firstRowTime = DateTime.MinValue; this._lastRowTime = DateTime.MinValue; this._numRows = 0; } #endregion

#region ProcessInput // This is NOT a regular method override you normally see in a script component // This override is called once for each input buffer (set of rows) that // are passed into this script component. We count rows here, and record // first and last "row seen times". public override void ProcessInput(int InputID, Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer Buffer) { base.ProcessInput(InputID, Buffer); #region Record the time the first row was seen if (this._firstRowTime == DateTime.MinValue) { this._firstRowTime = DateTime.Now; } #endregion #region If this buffer has any rows (the last buffer typically does NOT), record the time if (Buffer.RowCount > 0) { this._lastRowTime = DateTime.Now; } #endregion #region Add the number of rows in this buffer to the row count statistic this._numRows += Buffer.RowCount; #endregion } #endregion

#region PostExecute public override void PostExecute() { bool fireAgain = true; base.PostExecute(); #region Output statistics to package logging, as Information events ComponentMetaData.FireInformation(0, "", ComponentMetaData.Name + " first row seen at " + this._firstRowTime.ToString("hh:mm:ss.fff"), "", 0, ref fireAgain); ComponentMetaData.FireInformation(0, "", ComponentMetaData.Name + " last row seen at " + this._lastRowTime.ToString("hh:mm:ss.fff"), "", 0, ref fireAgain); TimeSpan duration = (this._lastRowTime - this._firstRowTime); ComponentMetaData.FireInformation(0, "", ComponentMetaData.Name + " watched " + this._numRows.ToString("###,###,##0") + " rows pass by in " + duration.TotalMilliseconds.ToString() + "ms", "", 0, ref fireAgain); #endregion } #endregion }

So what does it do? It records the time the first row passes it, the number of rows that go by, and the time the last row passes it.
How Can This Help Diagnose Performance?
If you want to know which section of your Data Flow takes the longest duration - place some of these scripts at various places in the flow. After you run your Data Flow, you'll see how long each section takes - keepin gin mind their execution times will overlap each other. (If you use two scripts, the second script's "first row" may very well be seen before the "last row" passes through the first script.)
If you're wondering what kind of impact a certain change has in your Data Flow - place this script before and after the affected area. Run your flow before you make the change, and again after. The results will show you how much impact your change had (but remember it won't account for hidden backpressure effects!)
If you're wondering which of several "branches" of a Data Flow that are combined using a Union All is the slowest, you can use these wiretaps to instrument each one.
Once again - please don't think this is a replacement for correct use of the decomposition technique. Component performance in the Data Flow depends heavily on other components coming before and after the area you're interested in - and even flows occurring in parallel in the same (or other) Data Flow.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Code Snippet for an SSIS DataFlow Performance Wiretap

Rate

Share

Share

Rate

Code Snippet for an SSIS DataFlow Performance Wiretap

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts