Need an alternative to merge join

  • I have large files. One with 100 Million records and the other with 7 Million. The Merge Join is taking too long to complete. Is there an alternative faster way of joining these files?

  • I think I found a way to get the job done without taking so much time using a merge join. I added anothe data flow where I load the contents of the larger file into a cache transform. Then once that data flow completes I execute my original dataflow and use a lookup transform to get the data I need from the Cache using the Cache Connection used to cache the larger file in the first data flow.

  • I would load the data from the files into a table, then use a sql task to create some indexes on the tables, and then join them using the appropriate JOIN deepening on your requirements. Then run a sql task to remove the tables.

    This should run a lot faster than doing the processing entirely in SSIS

  • Will give it a try...Thanks

  • steveb (5/5/2009)


    I would load the data from the files into a table, then use a sql task to create some indexes on the tables, and then join them using the appropriate JOIN deepening on your requirements. Then run a sql task to remove the tables.

    This should run a lot faster than doing the processing entirely in SSIS

    That's what I would do too - except for the remove tables bit. I'd drop the indexes & then truncate - or truncate & then drop - not sure which is faster. Then the tables are empty and ready for the next load - if you are doing this regularly, of course.

    Phil

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply