Data flow execution plan

  • Hi,

    I am writing a custom source component, that has 3 outputs. My source component reads rows from a remote source, and routes each row to one of the 3 outputs according to its type. Each output then has a following sequence of transformations, resulting in a destination.

    It is important for data consistency that the rows are processed all the way in the order the source read them and injected them to the different outputs. I'll give an example to make it clearer:

    Say output1 is directed to transformation1 and then OLE DB destination.

    output2 is directed to transformation2 and then OLE DB destination.

    output3 is directed to transformation3 and then OLE DB destination.

    row1 is injected to output1, and then row2 is injected to output2. I need to make sure row1 reaches the destination before row2 reaches the destination.

    I've been searching BOL and everybody's blogs for a few hours now, and didn't get to understand what's SSIS default behavior in such a case, and how I can impact the execution order inside a data flow.

    If someone can spread some light here I'll be really happy 🙂

    Thanks.

    Noa

  • This won't be easy... but I cannot speak for the custom source componet aspects.  It seems to me that in a custom component that you could use one or more script components and a series of flags to control the sequence.  I have tried doing it using the standard tasks/components on the toolbars... and found that executing similar code multiple times from withing a sequence container as teh only way for me to accomplish what I needed to do.

    An example:  I have two fields: attending doctor and admitting doctor, and they are both in the same record.  I needed to do a lookup to make sure the admitting doctor existed and if not insert him... before doing a lookup on the attending doctor, and inserting him if not present.  I also had to assure that if the admitting doctor and the attending doctor were the same person that he/she was not inserted twice.  So, obviously, the sequence was important... and the two tasks could not be done in parallel.

    Anyway, my two cents, for what it's worth (at last count, approximately two cents).  I hope this helps you formulate a direction.  Perhaps Jamie Thompson will respond as he seems to have a lot of experience.  I'm open to learning a new technique today


    Cheers,

    david russell

  • Noa,

    It is not possible to control in what order rows from the same source to differrent destinations are processed. In fact, you cannot control which of the destinations gets inserted to first.

    The workaround here is as follows (I'm assuming that it is satisfactory to insert all records to destination1 THEN all records to destination2 THEN all records to destination3):

    • Do the insertion to destination1. Drop rows for destination2 into a raw file (called rawfil2). Drop rows for destination3 into a raw file (called rawfile3)
    • Build a new data flow that takes data from rawfile2 and inserts it to destination2
    • Build a new data flow that takes data from rawfile3 and inserts it to destination3

    Hope this helps.

    -Jamie

    P.S. If you don't like the idea of dropping out to external files then you can achieve the same thing in a similar way using recordsets and the recordset destination. Its more difficult though because it involves writing code.

     

  • Thanks.

    now what if eventually all the rows can be inserted to the same destination?

    I have a source that brings rows in a certain order, then split to 3 cases (either inside my source component or using a conditional split transformation). each case does some transformations and eventually some SQL command should be executed in the destination (same destination for all cases).

    does this make life easier for keeping the rows in the correct order?

  • Well, you could combine them into a single insertion using a UNION ALL. Thereafter you can use a SORT transform to put them in any order you like. If all your rows have a row-identifier (http://www.sqlis.com/default.aspx?37) then it will be very easy to sort them into the order that they were in prior to the data-flow executing (if that is what you want).

    -Jamie

     

  • so let's see if I got it right:

    once I split to different cases, I have no guarantee what so ever that rows reaching the head of the split first will get to the end of it first (no FIFO behavior inside the split).

    therefore, I need to wait after the split (union all) and then wait for all the rows, sort them according to the original order, and only then execute my SQLs one after the other.

    am I right?

    I'm afraid it will cause poor performance, though.

    Thanks again 🙂

  • As far as I am aware there is no guarantee of the order in which the rows will occur once you split them into seperate streams of execution. I stand to be corrected however.

    The asynchronous SORT will have a slightly adverse affect on perf, you are right. How many rows are you dealing with?

    -Jamie

     

  • Do you think limiting the thread number to 1 would help?

  • No. In fact you can't do that anyway - the minimum thread number is 2.

    -Jamie

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply