SSIS : Data Flow paths

I had a very interesting bug that had me running around for over a week. Turns out the bug was in my code due to a misunderstanding of how SSIS processes the Data Flow Task. I've found in that time that SSIS not only evaluates, but actually 'runs' code even if there are no qualifying records. Let me break this down for you.

I have a Data Flow task that has 4 possible paths that are mutually exclusive. This means that at any given time only one of the four paths is a valid. The Data Flow is listed below:

I have highlighted in red the Script Destination components that were my unwitting accomplices. The Data Flow is one of several clones that determine 4 pieces of information. First is the document number I am about to process a statement or an invoice. That will push is down the left path (invoice) or the right path (statement). Second does that document already exist. If the document exists, count it and throw it away. If it doesn't exist, count it and insert it into the table.

When I run this package and specifically this Data Flow task, all boxes run green, but only 1 of them has a row attached. Yes, this package works on a single input at a time per Data Flow task, 8 tasks simultaneously (email me if you are curious why I did this, I might blog it if enough people are interested). This package imports .pdf files. Now that you understand what I was seeing, let's open up the Script Destinations and see what they look like and why they contributed to the problem.

If you look at either side you will see on branch where the document is a duplicate (already exists in the table) or new. Listed below is a snippet from the Script Destination from the path of a new document:

The import part of this script has been highlighted. I insert a record into a table and get back an auto-increment number (int32) as an id to further process this. I actually have a unique 20-byte binary (hash), but it seems the Web program they use to display .pdfs I store cannot properly pull/store the unique identify so I had to create another unique ID, but I digress (always wanted to say that 🙂 ). That number is transfered from the local variable webunique to the package variable Variables.Engine0webunique. This works fine EXCEPT when I decided to do what most programmers do; assign values for every path.

The code listed below is corrected, but has an annotation of what I actually had.

When I had the code snippet 'Variables.Engine0webunique=0' a problem arose, one that required me to think in parallel.

The problem was that since all paths in the dataflow, 4 in mine, must run to completion (all green), and there is no way to guarantee the order they will complete, the Variables.Engine0webunique value was being set at random. This could be the actual unique id pulled from the table insert or the 0 set by the 'document already exists' path. What do I do to solve this?

I needed a way to determine if I actually did get a value from the insert of a new document. I moved away from checking the actual value (is it >0) and added a flag. This flag would be set to true if an inserted a record or false if I did not. This enabled me to skip a step only needed on insert (for improved efficiency). Listed below is the outer framework code that uses this flag:

As you can see, the framework checks the state of the flag (true/false) and determines at each run if there is an existing or new document and runs the ancillary procedure needed by new documents, while bypassing that call if the document exists.

Moral of this post: When setting variables in a Data Flow Task, remember that all paths are run in unknown order so find a mechanism to know which paths have actual data on them.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

SSIS : Data Flow paths

Rate

Share

Share

Rate

SSIS : Data Flow paths

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts