You need to run a number of benchmarks, testing each with different parameters and data sizes, not to mention algorithms, while documenting run times and run parameters, over night unattended operation is a given, and just about now your task is getting more than a bit painful, and (hey, it sure has happened to me), if you're not fully automated, by now you're starting to loose some run results out of either exhaustion or boredom.
To the rescue, this Swiss Army Knife Gizmo provides a framework that gives you a compact way to specify what needs to happen, and then executes the benchmarks accordingly. In keeping with its Swissness, it can of course be equally useful for any production work that has similar requirements. Let's cut our teeth on:
A Minimal Benchmark
Requirements
|
The solution builds upon a previous article, so if the Getting silly for loop doesn't ring a bell, you probably want to check that out. The reusable gizmo consists of a for loop and the Logic script:
Benchmark implementation
|
On package execution:
- The for loop does two iterations, corresponding to the two FSM.AddState() calls
- The Logic script will ensure Benchmark runs twice, setting NumberOfRecords appropriately each time
- The Logic script also outputs descriptive info and elapsed times into the file specified with ReportBaseFilename:
StartTime,CurrentTime,Type,NumberOfRecords,ElapsedTime,Info 2005-09-08 17:47:01.000000,2005-09-08 17:47:01.667174,"Start",0,,"" 2005-09-08 17:47:01.000000,2005-09-08 17:47:01.667174,"RunInfo",0,,"Article final run." 2005-09-08 17:47:01.000000,2005-09-08 17:47:01.667174,"BenchmarkInfo",0,,"Run a script twice with different values for the variable NumberOfRecords." 2005-09-08 17:47:01.000000,2005-09-08 17:47:01.667174,"SystemInfo",0,,"Windows XP SP2. SQL Server 2005 Developer Edition, June CTP. Single CPU AMD 1800+. 1.5GB memory. Data and log files on a single disks, simple recovery model." 2005-09-08 17:47:01.000000,2005-09-08 17:47:03.910400,"Benchmark",100000,2.2231968,"" 2005-09-08 17:47:01.000000,2005-09-08 17:47:05.662920,"Benchmark",500000,1.75252,"" 2005-09-08 17:47:01.000000,2005-09-08 17:47:05.662920,"FinishElapsed",500000,4.66292,"" |
Tip: By default the date and time of starting the package is added to the output filename, creating a new output file for each run. To get multiple runs to output to the same file, simply remove the date and time portion from the filename (i.e. edit the Logic script where LogFilename is initialized.)Tip: The results are easy to import into other tools, such as Excel, for further processing and presentation, although you might need to adjust the date format to match with what your tool expects. |
A Larger Benchmark
This simulates a more realistic set of benchmarks, with placeholders for creating and deleting test data, resetting the environment between benchmarks, and varying multiple benchmark parameters, including using a property expression on a key data flow property.
Larger requirements
|
Laid out serially without reuse, this still fairly simple example would require 53 tasks!
Larger benchmark implementation
|
On package execution:
- The for loop does 53 iterations, corresponding to the 53 FSM.AddState() calls that are made on the first iteration
- For each iteration, FSM.ExecuteState() sets a bit in the variable BitMask to make the corresponding precedence expression and task trigger
- The Logic script also outputs descriptive info and elapsed times into the file specified with ReportBaseFilename (not shown here)
As we've seen, very little effort is needed to configure, run and output benchmark information. If you're absolutely dying to start using this, skip ahead and download the gizmo, otherwise check out the gory details making this possible, i.e. the:
Support Code in Logic
NB: The word State is sprinkled throughout the code - a state can be seen simply as one iteration of the for loop, together with the variables to set etc. for that iteration.
The Logic script needs to remember settings between invocations by the for loop. The script task does not allow static script variables that keep their value between invocations, so instead we're using a Class FSMData object stored in a package variable called User::FSMData to keep the necessary information.
Tip: User::FSMData has the extremely useful type System.Object, which can store objects of any type. The Data() property below shows how to allocate and type cast the object.Tip: With StateList As New List(Of State) on the other hand, we can use generics, new in .NET v2.0, removing the need for type casting when accessing the objects. |
FSMData.StateList has one item of Class State for each state added with FSM.AddState(), and the FSMData.CurrentState() property is the iterator for these states.
Private Shared ReadOnly Property Data() As FSMData Get If Dts.Variables("FSMData").Value.GetType().ToString() = "System.Object" Then Dts.Variables("FSMData").Value = New FSMData() End If Return CType(Dts.Variables("FSMData").Value, FSMData) End Get End Property Friend Class FSMData Private CurrentStateValue As Int32 Friend StateList As New List(Of State) Friend StateStartTime As DateTime Friend VariablesToLog As String() Friend LogFilename As String Sub New() CurrentState = 0 LogFilename = Dts.Variables("ReportBaseFilename").Value.ToString() & _ CDate(Dts.Variables("StartTime").Value).ToString("yyyy-MM-dd_HH_mm_ss") _ & ".txt" End Sub Public Property CurrentState() As Int32 Get Return CurrentStateValue End Get Set(ByVal value As Int32) CurrentStateValue = value End Set End Property End Class Friend Class State Friend Name As String Friend BitMask As Bits Friend Variables As Object() Sub New(ByVal _n As String, ByVal _b As Bits, ByVal ParamArray _v As Object()) Name = _n BitMask = _b Variables = _v End Sub End Class |
In A minimal benchmark we saw how AddState() was used to add benchmark runs and variable values. Note that the setVariables parameter can contain many <Variable name, Variable value> pairs, even with different value types!
ExecuteState() is responsible for calling appropriate logging subroutines, setting package variables as specified by AddState() calls, and incrementing the state iterator CurrentState.
' stateName = Nothing => Don't report elapsed time for this state Public Shared Sub AddState(ByVal stateName As String, ByVal bitMask As Bits, _ ByVal ParamArray setVariables As Object()) Data.StateList.Add(New State(stateName, bitMask, setVariables)) End Sub Public Shared Sub ExecuteState() If Data.CurrentState = 0 Then LogInitial() End If ' Log elapsed time of previous state If Not FSM.Empty And Data.CurrentState > 0 Then LogLine(Data.StateList(Data.CurrentState - 1).Name, _ FSM.ElapsedTime.ToString(), "") End If If Data.CurrentState >= Data.StateList.Count Then LogLine("FinishElapsed", (Now() - _ CDate(Dts.Variables("StartTime").Value)).TotalSeconds.ToString(), "") Dts.Variables("BitMask").Value = Bits.ExitLoop Else Dts.Variables("BitMask").Value = Data.StateList(Data.CurrentState).BitMask Dim i As Int32 For i = 0 To Data.StateList(Data.CurrentState).Variables.Length - 1 Step 2 Dts.Variables(Data.StateList(Data.CurrentState).Variables(i).ToString())_ .Value = Data.StateList(Data.CurrentState).Variables(i + 1) Next Data.StateStartTime = Now() Data.CurrentState += 1 ' Set the next state End If End Sub |
That covers the core functionality, and there's just a few additional helper routines such as the logging of info to disk, check just the logic script for those, or the whole package.
Super Gizmo
More ways to use or extend the framework:
- Currently the number and range of variable values is controlled solely by the Logic script. For maximum flexibility, the from/to/step values in the Logic.Main() for loops could themselves be controlled with package variables, allowing them to easily be specified differently for each Super Gizmo package invocation.
- Integration Services provides a configuration type, the Parent Package Variable configuration, for passing values from parent to child packages. Using that, together with an Execute Package Task, you can benchmark external packages.
- I elected to output results to a text file using the File class. You might consider using the built-in logging facilities, store output in the database etc., check out the Dts.Log() function.
- Load states, variables etc. from an external file or database.
- Property expressions can only be set on tasks, not on Data Flow components. This means you can't use a variable to change say the CacheType of a particular Lookup transform inside a data flow task. You can of course use the task switching abilities we've covered to switch between different data flow tasks, thereby optimizing performance.
- To go back to previous states depending on some script code you add, I suggest:
- Have AddState() return the number of the state added, and use that for any branching logic you add
- Add a parameter nextState to AddState() that specifies the number of the next state (instead of the currently implemented monotonously increasing state number), or a predefined constant State.Next, State.Same, State.Previous
- Either add your logic directly in Main(), or get funky adding delegates to states
Conclusion
- Currently, out-of-box SSIS lacks some obvious features for reuse, but makes up for it with easy to use flexibility, allowing the creation of the functionality you need
- The described framework is very useful for running multiple tasks many times with different variable settings
- It can also be used for production work, in which case controlling the variable from/to/step values from outside the Logic script is possibly a very useful extension
In short, don't leave home without it !-)
Kristian Wedberg is a Business Intelligence consultant living outside London, UK. He can be reached at this email address: trk2061 [at] wedberg.name
- Disclaimer: Opinions and comments expressed herein are my own, and does not necessarily represent those of my employer.