I was recently tweaking a few Data Flows, and settled on a very small, but very useful script to help record execution timing inside the flow.
Unfortunately, performing a decomposition analysis can take quite a while. It is time consuming to replace the destination component with a row count, run the flow, record the results, and continue up the flow.
Adding a Performance Wiretap is Easy
Especially when the code is completely copy-and-paste! I couldn't measure any performance impact of including this script code, and didn't expect any. It doesn't use any columns in the data flow, and the operations it does do are very lightweight. You don't even have to change anything in the script itself - the events that are logged include the name you give the component in the data flow (which must be unique) so you can easily identify which of several wiretaps is reporting information.
public class ScriptMain : UserComponent
{
#region Statistic collectors
private DateTime _preExecuteTime;
private DateTime _firstRowTime;
private DateTime _lastRowTime;
private int _numRows;
#endregion
#region PreExecute
// This method is called before any rows are read from any sources
// We reset all statistics here
public override void PreExecute()
{
base.PreExecute();
this._firstRowTime = DateTime.MinValue;
this._lastRowTime = DateTime.MinValue;
this._numRows = 0;
}
#endregion
#region ProcessInput
// This is NOT a regular method override you normally see in a script component
// This override is called once for each input buffer (set of rows) that
// are passed into this script component. We count rows here, and record
// first and last "row seen times".
public override void ProcessInput(int InputID, Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer Buffer)
{
base.ProcessInput(InputID, Buffer);
#region Record the time the first row was seen
if (this._firstRowTime == DateTime.MinValue)
{
this._firstRowTime = DateTime.Now;
}
#endregion
#region If this buffer has any rows (the last buffer typically does NOT), record the time
if (Buffer.RowCount > 0)
{
this._lastRowTime = DateTime.Now;
}
#endregion
#region Add the number of rows in this buffer to the row count statistic
this._numRows += Buffer.RowCount;
#endregion
}
#endregion
#region PostExecute
public override void PostExecute()
{
bool fireAgain = true;
base.PostExecute();
#region Output statistics to package logging, as Information events
ComponentMetaData.FireInformation(0, "",
ComponentMetaData.Name + " first row seen at " + this._firstRowTime.ToString("hh:mm:ss.fff"),
"", 0, ref fireAgain);
ComponentMetaData.FireInformation(0, "",
ComponentMetaData.Name + " last row seen at " + this._lastRowTime.ToString("hh:mm:ss.fff"),
"", 0, ref fireAgain);
TimeSpan duration = (this._lastRowTime - this._firstRowTime);
ComponentMetaData.FireInformation(0, "",
ComponentMetaData.Name + " watched " + this._numRows.ToString("###,###,##0") + " rows pass by in " + duration.TotalMilliseconds.ToString() + "ms",
"", 0, ref fireAgain);
#endregion
}
#endregion
}
So what does it do? It records the time the first row passes it, the number of rows that go by, and the time the last row passes it.
How Can This Help Diagnose Performance?
If you want to know which section of your Data Flow takes the longest duration - place some of these scripts at various places in the flow. After you run your Data Flow, you'll see how long each section takes - keepin gin mind their execution times will overlap each other. (If you use two scripts, the second script's "first row" may very well be seen before the "last row" passes through the first script.)
If you're wondering what kind of impact a certain change has in your Data Flow - place this script before and after the affected area. Run your flow before you make the change, and again after. The results will show you how much impact your change had (but remember it won't account for hidden backpressure effects!)
If you're wondering which of several "branches" of a Data Flow that are combined using a Union All is the slowest, you can use these wiretaps to instrument each one.
Once again - please don't think this is a replacement for correct use of the decomposition technique. Component performance in the Data Flow depends heavily on other components coming before and after the area you're interested in - and even flows occurring in parallel in the same (or other) Data Flow.