If you're anything like me, you get a queasy feeling in your stomach when
you're copying a piece of code over and over again, reusing it in different
places - you just know that it will come back and bite you, usually in
the shape of an ugly, hairy beast, growling "copies not synchronized, psycho." I
sure felt it breathing down my neck when setting up a benchmark recently; this
article is all about how to tame that beast - here's my original starting point:
The five tasks at the bottom (Simplex Benchmark to Scary Benchmark)
all produce the same end result, but do it in five different ways (I'll
leave the details of that to another article.) The five tasks at the top though (Read Tables 1 to Read Tables 5), they all do the
exact same thing, namely reading the same two tables from start to end - their
purpose is to give each benchmark a reasonably consistent initial environment.
Tip: To get reliable results, I need to control not only what happens in the database and on the host while the benchmark is running, but also put the whole system in a consistent state before starting each benchmark. More on that at a later date. |
Copy&paste
Let's not forget, copy&paste is the first level of code reuse - the
fact that you can so easily select a number of components and duplicate them
somewhere else in your package, or even in a different package, is a real
timesaver, both when creating exact duplicates like in the above screenshot, and
when using it as a starting point for creating new functionality.
Right now though, I can't stop glancing over my shoulders, I really
want to get rid of all but one of those Read Tables tasks; it's time to
refactor the solution by extracting the duplicated functionality, and put
it in a single spot, making it easy to maintain.
I remember that the precedence constraints between tasks in the control flow
does not allow circular paths - a task can not be linked to itself, not even
indirectly via another task, so that's out.
Let's check out the
Simple for loop
It's instructive to think about for loops and for each loops as
not just being very useful iterators of numbers, files etc., but also as being
very (common and) useful ways of reusing code:
If I configure my for loop to do five iterations, I will have reused the
Read Tables and A Benchmark tasks five times, just perfect! Except that
in my particular case, I need five different benchmarks, which leads us
to create a
Not so simple for loop
My for loop is configured to set the variable Step to {1, 2, 3, 4,
5}, in that order. This will be the clock tick that makes our construct work,
changing the control flow each time.
We use a single copy of Read Tables, and five different benchmark
tasks. I've also numbered the benchmark tasks from 1 to 5 to match up with the
for loop Step values - this visual aid makes it easier to see in which order
tasks will execute, even after moving them around in the layout pane.
The final part is to drag the precedence constraints from Read Tables
to each of the benchmark tasks, and then double click each link (or right click
and "Edit...", or use the properties window to set the corresponding values) and
configure them like this, changing the number to "1" through "5" for the five
links:
When we run this, the execution order will be:
- Read Tables
- 1. Simplex Benchmark
- Read Tables
- 2. Duplex Benchmark
- Read Tables
- 3. Complex Benchmark
- Read Tables
- 4. Huge Benchmark
- Read Tables
- 5. Scary Benchmark
Now this seems like a perfect fit for my problem in terms of functionality
and complexity, so as long as I only need this simple sequence, this is what
I'll use.
Tip: In the Precedence Constraint Editor above we specified Evaluation operation = "Expression". This has the side effect of allowing the benchmark tasks to run even if the Read Tables task fails! An alternative is to set Evaluation operation = "Expression and Constraint" - this will stop the precedence from triggering unless Read Tables actually succeeds. Tip: You can only set a precedence constraint if you have a preceding task! For instance, if you needed to run Read Tables after each benchmark, you would still need a preceding task to set the constraints against - simply use a no-op as a preceding task, such as a Sequence Container with nothing in it. |
But requirements change! What if I need to run three benchmark tasks in a
more complex sequence? Let's investigate the
Downright complex and best avoided for loop
- Complex Benchmark
- Duplex Benchmark
- Simplex Benchmark in parallel with Duplex Benchmark
- Simplex Benchmark in parallel with Complex Benchmark
- Duplex Benchmark in parallel with Complex Benchmark
- Complex Benchmark
- Duplex Benchmark
Here we have both repeating sequences of tasks, and tasks that sometimes run
sequentially, and sometimes in parallel.
One way to do this is to start with the Not
so simple for loop, and add more complex precedence expressions:
- We have seven steps, so make the loop variable Step go from 1 to 7
- For every benchmark, check which steps it should run in, i.e. Complex
Benchmark runs in step 1, 4, 5, 6
- For every benchmark, set the precedence expression to evaluate to true for the identified
steps, i.e. "@Step==1 || @Step>=4 || @Step<=6"
Are we having fun yet??? Depending on your answer, feel free to impress me
greatly (and get weird looks from your spouse I bet) by having that table driven
Finite State Machine on my desk by Monday, 9am sharp! To implement it
though, instead of Downright complex... I suggest you use the
Getting silly for loop
In the previous Downright complex... for loop we had our control logic
spread out across many precedence expressions. That's much too easy to get
wrong, so let's refactor again. Now we have centralized all the logic in
the single script called Logic, where we can configure arbitrary control
flows:
To handle the complex sequence, construct it thusly:
- Configure the for loop to iterate Step from 1 to 7, corresponding
to our seven steps in the sequence. Also add an integer variable BitMask to the
for loop.
- Name the benchmark tasks as 1, 2, 4 (continuing with 8, 16, 32, 64 etc.
for any additional tasks)
- Add code to the Logic script to set the bits in BitMask
according to which step in the sequence is executing - a set bit means the
corresponding benchmark task will execute in the current step:
Enum Bits SimplexBenchmark = 1 DuplexBenchmark = 2 ComplexBenchmark = 4 End Enum
Select Case CType(Dts.Variables("Step").Value, Integer) Case 1 Dts.Variables("BitMask").Value = Bits.ComplexBenchmark Case 2 Dts.Variables("BitMask").Value = Bits.DuplexBenchmark Case 3 Dts.Variables("BitMask").Value = Bits.SimplexBenchmark + Bits.DuplexBenchmark Case 4 Dts.Variables("BitMask").Value = Bits.SimplexBenchmark + Bits.ComplexBenchmark Case 5 Dts.Variables("BitMask").Value = Bits.DuplexBenchmark + Bits.ComplexBenchmark Case 6 Dts.Variables("BitMask").Value = Bits.ComplexBenchmark Case 7 Dts.Variables("BitMask").Value = Bits.DuplexBenchmark End Select
- Set the precedence expressions for the three benchmark tasks to trigger
when their corresponding bit in BitMask is set:
- (@BitMask & 1) != 0
- (@BitMask & 2) != 0
- (@BitMask & 4) != 0
Tip: Remember that we can have other tasks and containers running in parallel with our for loop - they can signal external events to our Logic script via variables for instance. Just don't go overboard with this unless it really is for a good cause... Tip: Instead of using the for loop AssignExpression to change Step, you might want to use the Logic script to update it - that's especially useful when you want to 'jump around' in the sequence, say "Go from step 5 to step 2 if today is Saturday or Sunday", and when the sequence should 'run forever'. |
No downloadable package today, I've left that as an exercise to you the
reader (I really hate it when writers use cop outs like that. Which is
exactly why I did say that, just now. Don't get mad, get even.-)
All in all, it is a bit silly, but in an irresistible kind of way -
just like my wife actually, and we had Love At First Sight! So don't be
surprised if this thingumajig pops up again.
Stop, I'm gagging!
I'll give that poor loop a rest for a while. As a heads up though, here are
some more ways to implement reuse:
- Using the loop counter to effect different behavior inside a task
- Putting SQL code in variables and external files
- Calling child packages
- Using event handlers
- Writing .NET assemblies
- Creating custom components
These are all pretty hefty subjects in their own right; we'll save them for
another day.
Ergo sum
- Precedence constraints can not have circular paths
- For loops are very useful for implementing reuse as well as for
implementing simple and complex control flows
- Understand the various ways of doing reuse - which technique and what
level of complexity is appropriate in your case, taking the foreseeable future
of your implementation into account?
- The Getting silly for loop is one way to implement Finite State
Machines in SSIS
There is truly an enormous amount of functionality packed inside Integration
Services, to the extent that it feels more like a 2.7 version than the 1.0 it
really is. As we've seen, there are many ways to accomplish code reuse, which is
lucky indeed, 'cause as you know, I sure don't like the sound of that
growling beast...
Kristian Wedberg is a Business Intelligence consultant living outside London,
UK. He can be reached at this email address: trk2061 [at] wedberg.name
- Disclaimer: Opinions and comments expressed herein are my own, and does not
necessarily represent those of my employer.