Linked Server "Data Pull" vs "Data Push"

  • Hi,

    I'm a self taught developer using SQL Server 2000 and 2005. At present my employer is doing a project which amoung other things moves data between various servers. In testing various algorithms I noticed that two functionally identical scripts perform drastically differently, and I was wondering if anyone knew why? (I have a guess at the end of the post)

    Cheers,

    Mat.

    The Scenario:

    There are two servers, each with one instance of SQL Server 2000. Each SQL Server has a linked server set up with access to the other. Each has one database with one table, they are identical;

    CREATE TABLE test (id INT NOT NULL UNQIUE, val INT NOT NULL)

    One table is empty and the other is full. I need to copy part of the data in to the empty table.

    The Pull Option:

    Run the following query on the server with the empty table.

    INSERT INTO [local_svr].[db].[dbo].[test]

    SELECT * FROM [remote_svr].[db].[dbo].[test] WHERE id BETWEEN 1000 AND 2000

    The Push Option:

    Run the following query on the server with the populated table.

    INSERT INTO [remote_svr].[db].[dbo].[test]

    SELECT * FROM [local_svr].[db].[dbo].[test] WHERE id BETWEEN 1000 AND 2000

    The Difference:

    In the case of a PULL, everything is extremely quick.

    (Even if the tables actually have multiple constraints, etc.)

    In the case of a PUSH, everything slows down by about 10 fold.

    The Questions:

    Is this a facet of some options on the SQL Servers and/or their Linked Servers?

    Is it simply due to how linked servers work in terms of which server does what tasks, etc?

    My Best Guess:

    In a PUSH the local server may be supplying only one row of data at a time. This is checked by the remote server while being inserted. Then everything repeats.

    In a PULL the remote server may execute the "SELECT x FROM y WHERE z" query, return the results as single batch and the local server deal with all the constraints, etc.

  • Hi Mat,

    I think the push is slower because there is would probably be distributed transactions involved.

    In most cases when you select data from a remote server and insert it into a local table there is usually no distributed transaction required.

    I say 'usually' because in the past I have had to fool Sql Server into not starting a distributed transaction by inserting the remote data set into a temp table and then into the local table.

    Also if you want to test your theory of the push inserting one row at a time on the remote server you can always make a trigger on the remote table and log the number of rows inserted.

    I have a feeling that the push is inserting all the rows at the same time and it is the overhead of the distributed transaction that is slowing things down.

    You can run a profiler trace while you are testing this and it will definately tell you if there are any distributed transactions occurring.

    Thanks,

    Andy

  • would anyone be able to give me a brief overview of what it would take to set up linked servers?

    I am working with someone who is using sql 2000 and i am using 2005.

    what kinds of security and roles should i have set? im currently usings windows auth.

    just wanting to know how generally to do it and any helpful advice anyone may have on the matter.

    thanks.

  • You can link servers by going into security in the enterprise manager for 2000. I'm not sure if you can link a 2005 from 2000 though. You may have to link a 2000 from 2005. More experience folks here can help you. I had to comment though, to tell you that you'll probably have more experienced folks help you if you posted this topic in a new thread where they'll see it better.

  • In the case you're talking about - you should also consider continuing to use an OPENQUERY syntax, and/or make sure that the collation compatible setting is ON in your linked server definition. If you don't pay attention, you may be pulling the ENTIRE table over the wire to then process the WHERE statement locally (so - pull 1M to only select 10K). I'm sure you could see how that wouldn't be ideal.

    Of course - this does argue for the PULL version since that would usually come across as:

    INSERT INTO [local_svr].[db].[dbo].[test]

    SELECT * FROM OpenQuery(remote_svr, 'select * from [db].[dbo].[test] WHERE id BETWEEN 1000 AND 2000')

    I don't think you can do a INSERT INTO OPENQUERY(...,.....) syntax.

    Either way - look at the execution plan and see where each piece is executing. The simple rule of thumb is - do the local work local and the remote work remotely when at all possible. So - procwess the WHERE on the server the table is on, pass the results over, and let the local server run the INSERT.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Did not try "INSERT INTO OPENQUERY", but "DELETE FROM OPENROWSET" works perfectly in my application.

    _____________
    Code for TallyGenerator

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply