August 24, 2009 at 1:16 pm
Hello,
I am experiencing a rather frustrating issue in a data flow I am designing and was wondering if anybody had had this issue and how they worked around it.
I have a simple data flow task that fetches rows sorted by a datetime column. Some processing is performed and then, there is a conditional split item that will call different OLE DB commands to update my table based on a certain 'Type' column (e.g. if Type = 1, update this column, if Type = 2, update this column and so on).
My issue is that the conditional split works as follows. If processes data based on the order of its conditions. For instance, if I have 3 rows, with Types 1, 2, 1 and my first condition is 'Type == 1', then the 2 rows are sent before the middle row with Type == 2 is sent further down. This is quite problematic for me because I want my update statements to be called in that same original order (ie by my datetime column).
The only way to work around this that I found was to change the MaxBofferRows propery on the whole data flow task. I was wondering if anybody had had such an issue with the way the conditional split works and if they had found better ways to handle it?
Thanks,
Greg
August 24, 2009 at 2:24 pm
You might consider using a Sort Transform for each output of your Conditional Split. You could sort by the DateTime column to guarantee order of insertion into your destination.
hth,
Tim
Tim Mitchell
TimMitchell.net | @tmitch.net | Tyleris.com
ETL Best Practices
August 24, 2009 at 6:35 pm
Hi,
Thanks for the answer. The issue I have is that the data is not processed in order when it reaches the conditional split. If my data comes in the order A, B, C but my order of conditions is A, C, B, then A, C, B is the order in which the data is actually processed after the conditional split. I'm guessing this is because SSIS wants to process all 'A' then all 'C' then all 'B' for performance reasons. But in my case, it does not help me unfortunately 🙁
Or maybe I haven't understood your suggestion? If that is so, I apologize please do elaborate a bit more if possible.
Thanks,
Greg
August 25, 2009 at 6:25 am
Greg,
If you're sending the rows to different outputs, would it make a difference if all of the C values are processed before the B values? You might do some checking to verify that the distinct values are in the same order - for example, the first A value in your source is indeed the first A value in your output.
In either case, I don't think you can be assured of an output order without using a sort.
hth,
Tim
Tim Mitchell
TimMitchell.net | @tmitch.net | Tyleris.com
ETL Best Practices
August 26, 2009 at 9:38 am
[p]You may be able to eliminate the conditional split and replace it with derived columns that use expressions to determine what goes where. For example, DerivedColumnA's expression could be "TYPE==1 ? InputColumnX : 0" and DerivedColumnB's expression could be "TYPE==2 ? InputColumnX : 0".[/p]
[p]As for the order of the output, are you using multiple OLEDB destinations that all use the same connection manager? If so then I suspect the issue is not the order of the output, but rather the order that the OLEDB updates are committed. I've found that when I used multiple OLEDB destinations (DestA, DestB, DestC) that updates are committed one destination at a time. This would cause records sent to DestA to be committed before DestB's records which would be committed before DestC's records.[/p]
August 26, 2009 at 9:45 am
Hi,
Well, using a data viewer, I clearly see that the rows are sent in sequence based on the order of the conditions when the data viewer makes the process halt, and not the original dataset order, even if i put a sort transform right before it.
In the end, I used something somewhat similar to the derived column you were talking about and got it to work but I was a bit surprised by this behavior to be honest.
Thanks,
Greg
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply