What transform to use when source data needs splitting up?

  • My source data is in one .csv file and needs to be split up and loaded into two separate normalized destination tables. Additionally, one of the destination tables contains a new column which does not exist in the source data, and will need to be populated on the fly with a controlled set of values, fully dependent on another value in the source csv file.

    Which transform do I use to split the 1 csv tile into two streams?

    Which transform do I use to look up a column in the csv file and based on what it fined, assign a value for a new column contained in one of the two destination tables?

    Thank you.

  • Split --> conditional split (what's in a name?)

    Regarding the lookup, that's a bit more difficult. Does the value you need to lookup exist on the same row? If yes, you could use a derived column.

    If no, I think you should probably use a MERGE JOIN (just like a join in SQL). For this component to work, the input data needs to be sorted.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Funny, ha ha 🙂 I was using these exact ones for that problem with some difficulty. Now I know I was on the right track.

    I will work on getting the conditional split working first.

  • I just put Conditional Split back on the palette and I remember the problem: It will only allow one arrow for output. So, am I to use two conditional splits for each half of the data?

    Additionally, I want those columns that do map to simply pass through, and have no need to apply an expression.

  • hxkresl (3/18/2011)


    I just put Conditional Split back on the palette and I remember the problem: It will only allow one arrow for output. So, am I to use two conditional splits for each half of the data?

    Additionally, I want those columns that do map to simply pass through, and have no need to apply an expression.

    At the conditional split you can define multiple outputs: each time you enter an expression, it maps to a single output. You can put multiple expressions below each other, where the top one will be evaluated first.

    There is also a default output, for all the rows that match none of the expressions.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply