Joining data in SSIS using a condition other than "=" on the Join keys of a merge join component

  • I was trying to merge some data today using the merge join component. the scenario was something like this:

    select * from tb1 inner join tb2 on tb1.a=tb2.a and tb1.date >= tb2.date

    The merge join component would have worked if it wasn't for the >= condition between tb1.date and tb2.date. Does anyone know how to accomplish such thing in SSIS? Should I ommit the "tb1.date >= tb2.date" condition and then use a split conditional transformation to get rid of the records that do not satisfy this condition?

    Thanks in advance for any help you can provilde.

    Luis

  • What is the source of your data?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • If the source isn't too complicated and it is a database, I would write the join in a SQL statement in the OLE DB Source, instead of using the merge join.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • In this particular case the sources are multiple tables from different sql servers.

  • luisefigueroa (4/29/2010)


    In this particular case the sources are multiple tables from different sql servers.

    If I were you, I would just write a SQL statement.

    Much much much more performant than sorting and merge joining (is that a real word?) in your data flow.

    If you are uncomfortable with hard-coding the servernames in the SQL statement (e.g. servername.database.schema.table), then store the values in a configuration table.

    Then configure some variables through package configurations and use these variables to construct your SQL statement. The SQL statement itself is stored in a string variable and is configured by an expression.

    In your OLE DB Source, use this variable as the input for your SQL.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thanks Da-Zero. So far it seems including the query on the OLE DB data source is the way to do this..

    I thought about this other option:

    I could omit the join condition with the “>=” operator i.e.

    from #textRaw t

    left join #actionRaw d on t.batch_id = d.batch_id

    and t.list_id = d.list_id

    and t.subscriber_id = d.subscriber_id

    And then push the resulting data set (Including the datecreated from both tables )to a split condition transformation in SSIS pushing the records that satisfy the condition “t_datecreated >= d_datecreated” down one pipeline and discarding the remaining records.

    This last option doesn’t sound like an efficient way to handle this scenario but might be the way to handle this if the exclusion of such records most happen within SSIS.

  • I'm not 100% sure, but I think that joining in the OLE DB source will be more performant than using the conditional split. The conditional split has to evaluate each row, while the database is optimized to handles such queries really fast, especially if you have the right indexes.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I agree. Thank you for your input.

    Luis

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply