SSIS - Code Reuse and Complex Control Flows

If you're anything like me, you get a queasy feeling in your stomach when

you're copying a piece of code over and over again, reusing it in different

places - you just know that it will come back and bite you, usually in

the shape of an ugly, hairy beast, growling "copies not synchronized, psycho." I

sure felt it breathing down my neck when setting up a benchmark recently; this

article is all about how to tame that beast - here's my original starting point:

The five tasks at the bottom (Simplex Benchmark to Scary Benchmark)

all produce the same end result, but do it in five different ways (I'll

leave the details of that to another article.) The five tasks at the top though (Read Tables 1 to Read Tables 5), they all do the

exact same thing, namely reading the same two tables from start to end - their

purpose is to give each benchmark a reasonably consistent initial environment.

Tip: To get reliable results, I need to control not only what happens in the

database and on the host while the benchmark is running, but also put the

whole system in a consistent state before starting each benchmark. More

on that at a later date.

Copy&paste

Let's not forget, copy&paste is the first level of code reuse - the

fact that you can so easily select a number of components and duplicate them

somewhere else in your package, or even in a different package, is a real

timesaver, both when creating exact duplicates like in the above screenshot, and

when using it as a starting point for creating new functionality.

Right now though, I can't stop glancing over my shoulders, I really

want to get rid of all but one of those Read Tables tasks; it's time to

refactor the solution by extracting the duplicated functionality, and put

it in a single spot, making it easy to maintain.

I remember that the precedence constraints between tasks in the control flow

does not allow circular paths - a task can not be linked to itself, not even

indirectly via another task, so that's out.

Let's check out the

Simple for loop

It's instructive to think about for loops and for each loops as

not just being very useful iterators of numbers, files etc., but also as being

very (common and) useful ways of reusing code:

If I configure my for loop to do five iterations, I will have reused the

Read Tables and A Benchmark tasks five times, just perfect! Except that

in my particular case, I need five different benchmarks, which leads us

to create a

Not so simple for loop

My for loop is configured to set the variable Step to {1, 2, 3, 4,

5}, in that order. This will be the clock tick that makes our construct work,

changing the control flow each time.

We use a single copy of Read Tables, and five different benchmark

tasks. I've also numbered the benchmark tasks from 1 to 5 to match up with the

for loop Step values - this visual aid makes it easier to see in which order

tasks will execute, even after moving them around in the layout pane.

The final part is to drag the precedence constraints from Read Tables

to each of the benchmark tasks, and then double click each link (or right click

and "Edit...", or use the properties window to set the corresponding values) and

configure them like this, changing the number to "1" through "5" for the five

links:

When we run this, the execution order will be:

Read Tables
1. Simplex Benchmark
Read Tables
2. Duplex Benchmark
Read Tables
3. Complex Benchmark
Read Tables
4. Huge Benchmark
Read Tables
5. Scary Benchmark

Now this seems like a perfect fit for my problem in terms of functionality

and complexity, so as long as I only need this simple sequence, this is what

I'll use.

Tip: In the Precedence Constraint Editor above we specified

Evaluation operation = "Expression". This has the side effect of

allowing the benchmark tasks to run even if the Read Tables task

fails! An alternative is to set Evaluation operation = "Expression

and Constraint" - this will stop the precedence from triggering unless

Read Tables actually succeeds.

Tip: You can only set a precedence constraint if you have a preceding task!

For instance, if you needed to run Read Tables after each benchmark, you would still need a preceding task to set the constraints against

- simply use a no-op as a preceding task, such as a Sequence Container with nothing in it.

But requirements change! What if I need to run three benchmark tasks in a

more complex sequence? Let's investigate the

Downright complex and best avoided for loop

Can it handle this?

Complex Benchmark
Duplex Benchmark
Simplex Benchmark in parallel with Duplex Benchmark
Simplex Benchmark in parallel with Complex Benchmark
Duplex Benchmark in parallel with Complex Benchmark
Complex Benchmark
Duplex Benchmark

Here we have both repeating sequences of tasks, and tasks that sometimes run

sequentially, and sometimes in parallel.

One way to do this is to start with the Not

so simple for loop, and add more complex precedence expressions:

We have seven steps, so make the loop variable Step go from 1 to 7
For every benchmark, check which steps it should run in, i.e. Complex
Benchmark runs in step 1, 4, 5, 6
For every benchmark, set the precedence expression to evaluate to true for the identified
steps, i.e. "@Step==1 || @Step>=4 || @Step<=6"

Are we having fun yet??? Depending on your answer, feel free to impress me

greatly (and get weird looks from your spouse I bet) by having that table driven

Finite State Machine on my desk by Monday, 9am sharp! To implement it

though, instead of Downright complex... I suggest you use the

Getting silly for loop

In the previous Downright complex... for loop we had our control logic

spread out across many precedence expressions. That's much too easy to get

wrong, so let's refactor again. Now we have centralized all the logic in

the single script called Logic, where we can configure arbitrary control

flows:

To handle the complex sequence, construct it thusly:

Configure the for loop to iterate Step from 1 to 7, corresponding
to our seven steps in the sequence. Also add an integer variable BitMask to the
for loop.
Name the benchmark tasks as 1, 2, 4 (continuing with 8, 16, 32, 64 etc.
for any additional tasks)

Add code to the Logic script to set the bits in BitMask

according to which step in the sequence is executing - a set bit means the

corresponding benchmark task will execute in the current step:

Enum Bits
  SimplexBenchmark = 1
  DuplexBenchmark = 2
  ComplexBenchmark = 4
End Enum

Select Case CType(Dts.Variables("Step").Value, Integer)
  Case 1
    Dts.Variables("BitMask").Value = Bits.ComplexBenchmark
  Case 2
    Dts.Variables("BitMask").Value = Bits.DuplexBenchmark
  Case 3
    Dts.Variables("BitMask").Value = Bits.SimplexBenchmark + Bits.DuplexBenchmark
  Case 4
    Dts.Variables("BitMask").Value = Bits.SimplexBenchmark + Bits.ComplexBenchmark
  Case 5
    Dts.Variables("BitMask").Value = Bits.DuplexBenchmark + Bits.ComplexBenchmark
  Case 6
    Dts.Variables("BitMask").Value = Bits.ComplexBenchmark
  Case 7
    Dts.Variables("BitMask").Value = Bits.DuplexBenchmark
End Select

Set the precedence expressions for the three benchmark tasks to trigger
when their corresponding bit in BitMask is set:
- (@BitMask & 1) != 0
- (@BitMask & 2) != 0
- (@BitMask & 4) != 0

Tip: Remember that we can have other tasks and containers running

in parallel with our for loop - they can signal external events to our

Logic

script via variables for instance. Just don't go overboard with this unless

it really is for a good cause...

Tip: Instead of using the for loop

AssignExpression to change Step, you might want to use the Logic

script to update it - that's especially useful when you want to 'jump

around' in the sequence, say "Go from step 5 to step 2 if today is Saturday

or Sunday", and when the sequence should 'run forever'.

No downloadable package today, I've left that as an exercise to you the

reader (I really hate it when writers use cop outs like that. Which is

exactly why I did say that, just now. Don't get mad, get even.-)

All in all, it is a bit silly, but in an irresistible kind of way -

just like my wife actually, and we had Love At First Sight! So don't be

surprised if this thingumajig pops up again.

Stop, I'm gagging!

I'll give that poor loop a rest for a while. As a heads up though, here are

some more ways to implement reuse:

Using the loop counter to effect different behavior inside a task
Putting SQL code in variables and external files
Calling child packages
Using event handlers
Writing .NET assemblies
Creating custom components

These are all pretty hefty subjects in their own right; we'll save them for

another day.

Ergo sum

Precedence constraints can not have circular paths
For loops are very useful for implementing reuse as well as for
implementing simple and complex control flows
Understand the various ways of doing reuse - which technique and what
level of complexity is appropriate in your case, taking the foreseeable future
of your implementation into account?
The Getting silly for loop is one way to implement Finite State
Machines in SSIS

There is truly an enormous amount of functionality packed inside Integration

Services, to the extent that it feels more like a 2.7 version than the 1.0 it

really is. As we've seen, there are many ways to accomplish code reuse, which is

lucky indeed, 'cause as you know, I sure don't like the sound of that

growling beast...

Kristian Wedberg is a Business Intelligence consultant living outside London,

UK. He can be reached at this email address: trk2061 [at] wedberg.name

Disclaimer: Opinions and comments expressed herein are my own, and does not

necessarily represent those of my employer.

SSIS - Code Reuse and Complex Control Flows

Copy&paste

Simple for loop

Not so simple for loop

Downright complex and best avoided for loop

Getting silly for loop

Stop, I'm gagging!

Ergo sum

Rate

Share

Categories

Share

Rate

SSIS - Code Reuse and Complex Control Flows

Rate

Share

Categories

Share

Rate

Related content

Cookies on SQLServerCentral