Distributed Tasks

  • I have a report that's based on a stored procedure. It works fine, but it's doing joins across linked servers to pull data from serveral servers. We are wanting to put this report into production, but the production policy is no linked servers. To get around this, my best option seems to be an SSIS job that consolidates the data into one table in a reporting database. I've never done anything this complex with SSIS. Should I create a data flow task for the data on each remote server and then use like a merge join to relate the data back together?

    Thanks!

    .

  • Your solution should work, but make sure to put an ORDER BY on the join columns and mark the input as sorted in the advanced editors of the sources, as the MERGE JOIN expects sorted input.

    You can find an explanation here:

    http://www.ssistalk.com/2009/09/17/ssis-avoiding-the-sort-components/

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I forgot:

    if the result of the query is huge, you'd be better off with importing each table seperately and then join in SQL Server itself. Using the dataflow will result in getting all the result pulled into memory (luckily not at the same time if you are not using blocking transformations), which can be troublesome if you have a lot of rows.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • You have a few options:

    Option 1: As Koen mentioned, use the Merge Join transform with multiple data sources, and make sure they are ordered correctly. Advantage: Simple to set up, and pretty intuitive. Disadvantage: Merge join can be very slow if you're dealing with large recordsets.

    Option 2: Again as Koen mentioned, have a data flow task that just takes as a source the remote tables, and puts them into a local table on one server, then manipulate them accordingly. Advantage: With all the data being local, and hopefully indexed properly, your transformations will be very fast, since SQL Server does the query work much better than SSIS. Disadvantage: You have to deal with importing the data to your local table properly. You can truncate and do a mass dump, but this could be messy if you have a large number of records. On the other hand, if you try to only insert new records, and you have no easy way of distinguishing them, it could be difficult to pull off.

    Option 3: Use the Lookup transform to find the values you need from the different servers, and then work with them. Advantage: Very fast, since SSIS is great at doing in-memory lookups. Disadvantage: Takes up memory. If your server is limited on memory, or the amount of data being cached for the lookups is large, this could cause serious server problems.

    Personally I think that I would go with Option 3 if you have the memory to be able to deal with it. If not, I'd go with Option 2. If you have no easy way of being able to distinguish new rows, then just create an identity field on your tables, and then put that as a key in your destination table, then only insert rows where the key doesn't match.

  • Perfect. I'll try option 3 first and see how it goes. If I have memory issues, I'll fall back to option 2.

    Thanks!

    .

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply