Pulling data in one set based on existence in another

  • What SSIS objects do you use to ask a question similar to, "Return records in Table A on Sever A where the account number exists in Table B on Server B?" Before trying to accomplish this in SSIS, I was using linked servers and inner joins in a SQL Agent Job, but learned that performance with joins involving linked servers SUCKS....and was advised to look into SSIS.

    BTW: I'm pretty sure the answer is NOT to use a connection to one of the servers and then continue using four-part names with linked servers to the other server, all using an Execute SQL Task ;-).

  • I may have found an answer....Merge Joins. Sound right?

    http://www.sqlservercentral.com/Forums/Topic475238-148-1.aspx

    odd...I couldn't find a match for "inner join" using the forum search tools, but then found a link back to this forum from google with matches to "inner join"

  • Or use a lookup. Depends if you need data from Server B or just need to know if the record exists.

  • Actually, I do need some fields from both servers...but the results need to be only those records that exist in both when matching by some key fields. Server A has something like 20M records, meanwhile Server B has about 1K. I'm trying to determine the most efficient way to pull from Server A ONLY matches for the 1K.

    I just tried using a LOOKUP and I think this will work. But I'm wondering about performance. In my test, the input for the Lookup was the data from Server B (since it's smaller)....and the Lookup was configured to use the larger dataset....and it was taking a little longer than I expected. But I'm wondering if I need to swap and have Server A be the input to the Lookup and then configure the data from Server B inside the lookup.

    Advice?

  • Lookup Transformation worked...larger table as input to the Lookup Transform...smaller table configured inside the Lookup Transform.

    The only caviat was that by default the Lookup Transformation expects a match and the component fails if no match is found (0xC020901E was one of the errors). I then had to use the "Configure Error Output" of the Lookup Transformation to either redirect the output or ignore. I chose redirect without actually sending the rows anywhere. This results in a validation warning, but it seems to work.

    Still testing other options, but I'm cool with this one so far.

  • I'd be tempted just to copy the data from server B to server A, run my query with an inner join and then delete the newly copied data from server A. It's a rather low-tech solution, but it would be fast.

    --
    Scott

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply