Linked Servers vs. Integration Services - Pros and Cons

  • We have a lot of linked servers in our environment, and we (the DBAs) have recently started pushing our developers to create SSIS packages for instance-to-instance data transfers, instead of relying on linked servers.

    There is some resistance to this effort: people are questioning the wisdom of switching from linked servers to SSIS.

    My question is, what are the benefits of SSIS vs. linked servers?

    Is there a performance benefit and in which scenarios?

    I know SSIS uses BULK INSERT by default, and that speeds up insert operations.

    I am also aware of the security benefits of SSIS versus linked servers.

    Are there other benefits, performance-related or otherwise?

    Any reference links on this issue would be appreciated.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • I like both but each has its own purpose..

    Many (most) environments end up with a mix. With Linked servers being used less.

    For data that you need the absolute most current version then a linked server MIGHT be the right answer, evaluate how current the data REALLY needs to be. Most lookup data can wait a day. Or if you need more current keep a copy locally and sync every hour.

    ANY time you are transfering large amounts of data, SSIS is the default choice, think long and hard before using linked servers, SSIS will ALWAYS (never found a case it wasn't true..) be faster when using BULK INSERTs.

    Linked servers also create disaster recovery issues, can your app operate without ALL servers being available, local copies of data can be great.

    Linked servers also potentially create issues with moving code from dev -> qc -> prod, since a linked server, without some other effort will always reference a specific server.

    Evaluate each use of a linked server, find out what and how much is transfered and how current the data must be.

    These are the basis for the decision to stay linked server or move to SSIS.

    CEWII

  • Elliott W (8/12/2009)


    I like both but each has its own purpose..

    Many (most) environments end up with a mix. With Linked servers being used less.

    For data that you need the absolute most current version then a linked server MIGHT be the right answer, evaluate how current the data REALLY needs to be. Most lookup data can wait a day. Or if you need more current keep a copy locally and sync every hour.

    ANY time you are transfering large amounts of data, SSIS is the default choice, think long and hard before using linked servers, SSIS will ALWAYS (never found a case it wasn't true..) be faster when using BULK INSERTs.

    Linked servers also create disaster recovery issues, can your app operate without ALL servers being available, local copies of data can be great.

    Linked servers also potentially create issues with moving code from dev -> qc -> prod, since a linked server, without some other effort will always reference a specific server.

    Evaluate each use of a linked server, find out what and how much is transfered and how current the data must be.

    These are the basis for the decision to stay linked server or move to SSIS.

    CEWII

    Thanks for the response, you raise some interesting points.

    Any tips on evaluating use of linked servers?

    Should I use SQL trace to track linked-server use?

    Any other ways I could go about monitoring linked-server traffic?

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • I don't know of any tools, but the first problem is figuring out what is using linked servers, you can do a search of the syscomments using the LIKE statement. You search for the Linked server name and that will tell you all objects using each linked server. Once you have that then you can do additional research.

    CEWII

  • Elliott W (8/12/2009)


    I don't know of any tools, but the first problem is figuring out what is using linked servers, you can do a search of the syscomments using the LIKE statement. You search for the Linked server name and that will tell you all objects using each linked server. Once you have that then you can do additional research.

    CEWII

    Thanks for your input.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • This link looks interesting:

    http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/ce92ab37-b051-4f9e-9072-9fce685c83c0

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • The issue in that link is Oracle. Are you using Oracle? Are pushing data to it or from it, to it then that article probably applies, from it probably not..

    CEWII

  • I'm not using ORACLE, but the link contained interesting info all the same.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Marios Philippopoulos (8/12/2009)


    We have a lot of linked servers in our environment, and we (the DBAs) have recently started pushing our developers to create SSIS packages for instance-to-instance data transfers, instead of relying on linked servers.

    FWIW, my own (uneducated, unproven) bias is to always prefer linked servers. Any ETL process is, by nature, batch mode, and the data transferred is soon out of date. I see the distinction as between answering only the questions that are asked, using up-do-date data at the source server, versus copying all the "answers" in batch, just in case the question might be asked of the reporting server.

    Ok, you did say "instance-to-instance data transfers" implying you were batch moving "all the answers" in either case. My inclination would be to leave the "answers" at the source server, and provide views/procedures (processed locally at the source) to respond to data demands from "reporting" server. (And I am not sure that the overhead of defining and running SSIS transfers would outweigh any potential performance increases to be gained.)

    I don't expect many to agree, and I have yet see this approach implemented much in the real world; just my gut feel, distaste for batch transfers, and aversion to SSIS.

  • Jim,

    I too like linked servers, in moderation.. I think that the right tool for the right job is the important thought.

    I do agree that providing sprocs and such to answer questions is a good method, I have used it myself to great effect. But I have also had a number of cases that my lookup data does not need to be 100% current and it helps to have it available locally, in the past I satisfied that with linked servers but the IO and latency cost of pulling the data over the pipe EVERYTIME I needed it was approaching unacceptable. Since the data in almost all cases didn't have to be 100% current or I knew the data wouldn't change during the day I built a process to pull the data over and cache it locally. The package pulled the data over and cost less than 4 minutes a day once. The queries that used the data ran over 5x faster and they were called several hundred times a day.. The user "felt" the performance increase.

    But mileage varies.

  • Hey guys, good point on caching data locally at set intervals and using local queries during the day (as opposed to linked servers).

    At this point of the discussion I have a performance-related question on a simple package (SSIS 2008) I created recently.

    In the package I simply get data from one SQL instance (instance1) and deposit it in a database on another SQL instance (instance2) using a DataFlow component. The process takes about 3 hours.

    I haven't done the test, but would you expect this to be faster than if it was done through a linked server?

    The SSIS package is run as a SQL Agent job on the destination (instance2), so this can be thought of as a data pull.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Three thoughts:

    1. Is the destination table empty at the beginning of the load? I seem to remeber being told that this can prevent the fast load option from being used in practice.

    2. Does the Data Access Method on the destination OLE DB object read: "Table or view - Fast Load". If not then it is really doing little more than a bunch of inserts, I had this happen once, it makes a HUGE difference.

    3. How many rows and how wide are the rows?

    CEWII

  • Elliott W (8/14/2009)


    Three thoughts:

    1. Is the destination table empty at the beginning of the load? I seem to remeber being told that this can prevent the fast load option from being used in practice.

    2. Does the Data Access Method on the destination OLE DB object read: "Table or view - Fast Load". If not then it is really doing little more than a bunch of inserts, I had this happen once, it makes a HUGE difference.

    3. How many rows and how wide are the rows?

    CEWII

    Thanks, here are answers to your questions:

    1. Destination table is empty at the start of the load

    2. Data access mode for OLEDB Destination is set to "Table or view - fast load"

    3. Only about 10,000 rows are loaded into the table; the table is pretty narrow;

    there are four columns with data types, respectively:

    - int

    - tinyint

    - char(1)

    - datetime

    Last night the job ran in 3 hrs. 51 min!

    Now that I have noticed the amount of data involved, this is really bad performance...

    I wonder if there is any other config setting I have missed switching on...

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • I just noticed that I'm using the "Table or View" access mode in the OLEDB Source Adapter. According to this link this is very bad for performance:

    http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-ssis-tuning-the-dataflow-task/

    Use a SQL Select Statement to Retrieve Data From a View

    Avoid using the Table or view access mode in the OLE DB Source Adapter. It is not as performant as using a SELECT statement because the adapter opens a rowset-based on the table or view. Then it calls OpenRowset in the validation phase to retrieve column metadata, and later in the execution phase to read out the data.

    Testing has shown that using a SELECT statement can be at least an order of magnitude faster, because the adapter issues the specified command directly through the provider and fetches the data using sp_prepare without executing the command, avoiding the extra roundtrip and a possibly inappropriate cached query plan.

    I will switch to "SQL command" mode (using a SQL statement) and see if that helps.

    (In the source we are selecting all data from a view.)

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • That is shockingly bad performance. The other thought I had is what ELSE is going on on the source and destination servers at the same time, could there be other processes manipulating the source data, could there be some reorg operation going on?

    And I know this is a shameless plug, guilty, but I wrote a tool for this and is available on my website:

    http://www.novaconceptsltd.com

    The tool is called SQL Job History Visualization and you can use it for 15 days before you need to consider licensing.

    You should be able to see any job that is running on the server at the same time..

    CEWII

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply