Parallel query execution from linked servers

Question

Parallel query execution from linked servers

mtlsql

SSCommitted

Points: 1528
More actions
July 7, 2015 at 3:31 pm

#319638

I am a developer (not DBA) for a company, and have inherited the following scenario: A database was created on a Virtual Machine. The VM was then cloned twice, and sent to other geographic locations. I have another SQL Server 2008 R2 server where I have linked servers set up to the other 3 (we'll call them SRV_A, SRV_B, and SRV_C). I have a parameterized query that runs against all 3 servers, does some other calculations (beyond the scope of this discussion), and combines the output. To achieve this, I create a table-type variable, and insert the results of each of the three queries into it. A grotesquely oversimplified example looks like this:
DECLARE @Table TABLE (...);
PRINT CAST(GETDATE() AS VARCHAR) + ': Getting SVR_A data...';
INSERT INTO @Table
SELECT * FROM SRV_A.DB.dbo.Tbl;
PRINT CAST(GETDATE() AS VARCHAR) + ': Getting SVR_B data...';
INSERT INTO @Table
SELECT * FROM SRV_B.DB.dbo.Tbl;
PRINT CAST(GETDATE() AS VARCHAR) + ': Getting SVR_C data...';
INSERT INTO @Table
SELECT * FROM SRV_C.DB.dbo.Tbl;
SELECT * FROM @Table;
It works fine (but really slow), but based on the output of the PRINT statements, SRV_B waits for SRV_A to finish before it starts, and SRV_C waits for SVR_B.
MY QUESTION: Is there any way to get these queries, coming from 3 different linked servers, to execute in parallel? I've searched online, and found vague references to a MAXDOP query hint, but no examples of how it works or if it applies to this particular situation.
By the way, before proposing alternatives such as converting to things like replication or mirroring, it won't work for reasons more political than technological; again, I'm a developer, not a DBA, and don't have the authority for such actions. So, distributed queries is the way I have to go.
Thank you!

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101235 More actions · Answer 1

Not directly, at least not easily.

But you could put each in a separate job and start the three jobs -- the sp_start_job command runs and returns immediately, without waiting for the job to start.

But you'd want to use a real table rather than a table variable. Most convenient is to create a permanent table in tempdb (hopefully you have permissions for this).

I'd also include the server name in the data being inserted and cluster the table on that, to isolate the inserts from each other. For example:

PRINT CAST(GETDATE() AS VARCHAR) + ': Getting SVR_A data...';

INSERT INTO tempdb.dbo.server_data

SELECT 'SRV_A' AS server_name, * FROM SRV_A.DB.dbo.Tbl;

When you're done with the table, you can truncate it (or just drop it).

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 2

The MAXDOP hint is a way of adjusting the degree of parallelism for a given query. It won't force parallelism onto a query. To do that, you have two options. Adam Machanic has a script that forces plans to have a very high cost, which will make a parallel plan. Or, there's an undocumented traceflag that will force a plan to have zero cost, which will result in a parallel plan.

I'm not sure either of these is really what you're looking for though. I suspect you may need some query and index tuning to make things run faster.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

doug 88182 Old Hand Points: 327 More actions · Answer 3

I have a similar challenge.

I have 40 servers.

Each server has a [dbadb] database.

Each database has a view named v_DatabaseFiles that shows information on the database files & mountpoint free space.

I have a management group in SSMS that links to all 40 servers at once.

I also have one server that can link to the other 39.

When I run select * from v_DatabaseFiles from an SSMS query that is attached to all 40 servers, the execution takes less than a second and I get back about 1000 rows.

I'd like to set up a reporting services report that displays the view, but RS doesn't seem to support a data source that hits multiple servers.

I tried making a view like this:

[font="Courier New"]create view dbo.dist_DatabaseFiles

as

select * from [server1].dbadb.dbo.v_DatabaseFiles

union all select * from [server2].dbadb.dbo.v_DatabaseFiles

union all select * from [server3].dbadb.dbo.v_DatabaseFiles

union all select * from [server4].dbadb.dbo.v_DatabaseFiles

...

union all select * from [server40].dbadb.dbo.v_DatabaseFiles [/font]

and the execution plan seems to suggest that it will parallelize.

The problems are that:

1) It still seems to query each v_DatabaseFiles in series, making the execution time take about 30x longer. (10 seconds instead of .3 seconds)

2) If one of the servers is inaccessible, the view breaks

10 seconds isn't a deal breaker for a report, but it is noticeable on a report.

I've tried playing around with other methods such as dynamic query generation (only union in the servers that are accessible) , or cycling through a list of queries and doing a RPC remote proc execution. Those methods are more robust because I can test each linked server before querying, but they are still much slower.

I guess I probably have to see what I can do with Data Collector to centralize the info before reporting, but it adds to the complexity and maintenance. Maintaining that many linked servers also sucks, but it is easier to teach a DBA how to maintain a list of linked servers than it is to teach them to maintain Data Collector.

edit:

This link was interesting-- http://www.sommarskog.se/query-plan-mysteries.html#linkedservers

I'm still trying to wrap my head around it. My management studio connection had sa rights, while the linked servers were using credentials with minimal permissions. The view has a subquery to pull the most recent record off the history table, so looks like i need to do more research.

curious_sqldba SSC-Dedicated Points: 36502 More actions · Answer 4

doug 88182 (8/24/2015)
I have a similar challenge.
I have 40 servers.
Each server has a [dbadb] database.
Each database has a view named v_DatabaseFiles that shows information on the database files & mountpoint free space.
I have a management group in SSMS that links to all 40 servers at once.
I also have one server that can link to the other 39.
When I run select * from v_DatabaseFiles from an SSMS query that is attached to all 40 servers, the execution takes less than a second and I get back about 1000 rows.
I'd like to set up a reporting services report that displays the view, but RS doesn't seem to support a data source that hits multiple servers.
I tried making a view like this:
[font="Courier New"]create view dbo.dist_DatabaseFiles
as
select * from [server1].dbadb.dbo.v_DatabaseFiles
union all select * from [server2].dbadb.dbo.v_DatabaseFiles
union all select * from [server3].dbadb.dbo.v_DatabaseFiles
union all select * from [server4].dbadb.dbo.v_DatabaseFiles
...
union all select * from [server40].dbadb.dbo.v_DatabaseFiles [/font]
and the execution plan seems to suggest that it will parallelize.
The problems are that:
1) It still seems to query each v_DatabaseFiles in series, making the execution time take about 30x longer. (10 seconds instead of .3 seconds)
2) If one of the servers is inaccessible, the view breaks
10 seconds isn't a deal breaker for a report, but it is noticeable on a report.
I've tried playing around with other methods such as dynamic query generation (only union in the servers that are accessible) , or cycling through a list of queries and doing a RPC remote proc execution. Those methods are more robust because I can test each linked server before querying, but they are still much slower.
I guess I probably have to see what I can do with Data Collector to centralize the info before reporting, but it adds to the complexity and maintenance. Maintaining that many linked servers also sucks, but it is easier to teach a DBA how to maintain a list of linked servers than it is to teach them to maintain Data Collector.
edit:
This link was interesting-- http://www.sommarskog.se/query-plan-mysteries.html#linkedservers
I'm still trying to wrap my head around it. My management studio connection had sa rights, while the linked servers were using credentials with minimal permissions. The view has a subquery to pull the most recent record off the history table, so looks like i need to do more research.

If this is something you want execute manually instead of subscription you could do something mentioned here

curious_sqldba SSC-Dedicated Points: 36502 More actions · Answer 5

mtlsql (7/7/2015)
I am a developer (not DBA) for a company, and have inherited the following scenario: A database was created on a Virtual Machine. The VM was then cloned twice, and sent to other geographic locations. I have another SQL Server 2008 R2 server where I have linked servers set up to the other 3 (we'll call them SRV_A, SRV_B, and SRV_C). I have a parameterized query that runs against all 3 servers, does some other calculations (beyond the scope of this discussion), and combines the output. To achieve this, I create a table-type variable, and insert the results of each of the three queries into it. A grotesquely oversimplified example looks like this:
DECLARE @Table TABLE (...);
PRINT CAST(GETDATE() AS VARCHAR) + ': Getting SVR_A data...';
INSERT INTO @Table
SELECT * FROM SRV_A.DB.dbo.Tbl;
PRINT CAST(GETDATE() AS VARCHAR) + ': Getting SVR_B data...';
INSERT INTO @Table
SELECT * FROM SRV_B.DB.dbo.Tbl;
PRINT CAST(GETDATE() AS VARCHAR) + ': Getting SVR_C data...';
INSERT INTO @Table
SELECT * FROM SRV_C.DB.dbo.Tbl;
SELECT * FROM @Table;
It works fine (but really slow), but based on the output of the PRINT statements, SRV_B waits for SRV_A to finish before it starts, and SRV_C waits for SVR_B.
MY QUESTION: Is there any way to get these queries, coming from 3 different linked servers, to execute in parallel? I've searched online, and found vague references to a MAXDOP query hint, but no examples of how it works or if it applies to this particular situation.
By the way, before proposing alternatives such as converting to things like replication or mirroring, it won't work for reasons more political than technological; again, I'm a developer, not a DBA, and don't have the authority for such actions. So, distributed queries is the way I have to go.
Thank you!

SQL Server is sequential, it will run in the order script is lined in a batch.

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 6

doug 88182 (8/24/2015)
I have a similar challenge.
I have 40 servers.
Each server has a [dbadb] database.
Each database has a view named v_DatabaseFiles that shows information on the database files & mountpoint free space.
I have a management group in SSMS that links to all 40 servers at once.
I also have one server that can link to the other 39.
When I run select * from v_DatabaseFiles from an SSMS query that is attached to all 40 servers, the execution takes less than a second and I get back about 1000 rows.
I'd like to set up a reporting services report that displays the view, but RS doesn't seem to support a data source that hits multiple servers.
I tried making a view like this:
[font="Courier New"]create view dbo.dist_DatabaseFiles
as
select * from [server1].dbadb.dbo.v_DatabaseFiles
union all select * from [server2].dbadb.dbo.v_DatabaseFiles
union all select * from [server3].dbadb.dbo.v_DatabaseFiles
union all select * from [server4].dbadb.dbo.v_DatabaseFiles
...
union all select * from [server40].dbadb.dbo.v_DatabaseFiles [/font]
and the execution plan seems to suggest that it will parallelize.
The problems are that:
1) It still seems to query each v_DatabaseFiles in series, making the execution time take about 30x longer. (10 seconds instead of .3 seconds)
2) If one of the servers is inaccessible, the view breaks
10 seconds isn't a deal breaker for a report, but it is noticeable on a report.
I've tried playing around with other methods such as dynamic query generation (only union in the servers that are accessible) , or cycling through a list of queries and doing a RPC remote proc execution. Those methods are more robust because I can test each linked server before querying, but they are still much slower.
I guess I probably have to see what I can do with Data Collector to centralize the info before reporting, but it adds to the complexity and maintenance. Maintaining that many linked servers also sucks, but it is easier to teach a DBA how to maintain a list of linked servers than it is to teach them to maintain Data Collector.
edit:
This link was interesting-- http://www.sommarskog.se/query-plan-mysteries.html#linkedservers
I'm still trying to wrap my head around it. My management studio connection had sa rights, while the linked servers were using credentials with minimal permissions. The view has a subquery to pull the most recent record off the history table, so looks like i need to do more research.

Selecting everything from all the servers is probably just going to run slowly. Without filtering or anything in place to guide processing, this is just down to scans, how fast your disks are, how much memory you have. There's no real way to speed it up by query tuning or indexing. It's just a hardware or a process issue to make it run faster.

On a side note, the only people who will see your question are the ones who have responded to this thread. If you really want to get a lot more people to look at it, you'd be better off posting a new question.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning