Performance on Views that span databases on multiple servers

Question

Post reply

Performance on Views that span databases on multiple servers

Eric Anderson

SSC Journeyman

Points: 89
More actions
July 9, 2010 at 11:36 am

#226706

We have many databases at our company. There are two in particular both of which are on two different virtual servers but both virtual servers are on the same node (physical box). When we have a view that returns data that is joined from the two different servers, our performance is significantly slower than if the databases are both on the same logical server. Is there any thing we can look at in order to improve response time in this situation?
There are many times we join data from server to server and placing all of our databases on the same server isn't possible. If anyone has any insight it would be greatly appreciated.
Thanks! - Eric

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

SQL Marathoner SSC-Addicted Points: 417 More actions · Answer 1

You can try indexed view, which may solve the performance issue.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 2

For this view, you are probably using linked servers. What is the setup of your linked server?

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Jeff Moden SSC Guru Points: 1004503 More actions · Answer 3

Eric Anderson (7/9/2010)
We have many databases at our company. There are two in particular both of which are on two different virtual servers but both virtual servers are on the same node (physical box). When we have a view that returns data that is joined from the two different servers, our performance is significantly slower than if the databases are both on the same logical server. Is there any thing we can look at in order to improve response time in this situation?
There are many times we join data from server to server and placing all of our databases on the same server isn't possible. If anyone has any insight it would be greatly appreciated.
Thanks! - Eric

How up to date does the data actually need to be?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004503 More actions · Answer 4

SQL Marathoner (7/11/2010)
You can try indexed view, which may solve the performance issue.

Ummm... Good idea but probably won't work here. See the following URL and look for "View Restrictions"...

http://technet.microsoft.com/en-us/library/cc917715.aspx

Where one of the restrictions is

"To create an index on a view in SQL Server 2005, the view definition must not contain: {snip} reference to a table or function in a different database"

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 398938 More actions · Answer 5

If you're working with linked servers and running queries against them, you need to know that unless you specifically use a construct like OPENQUERY and pass in parameter values, all data is brought across the wired between the servers and then filtered and processed on the local server. This can lead to some truly horrific performance. While linked servers are available for this type of querying, that's not their primary function. They really should be treated as a mechanism for easy transfer of data and then process queries independently from the linked server. Querying, especially JOINs, etc., across linked servers is notoriously problematic.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 6

Grant Fritchey (7/12/2010)
If you're working with linked servers and running queries against them, you need to know that unless you specifically use a construct like OPENQUERY and pass in parameter values, all data is brought across the wired between the servers and then filtered and processed on the local server. This can lead to some truly horrific performance. While linked servers are available for this type of querying, that's not their primary function. They really should be treated as a mechanism for easy transfer of data and then process queries independently from the linked server. Querying, especially JOINs, etc., across linked servers is notoriously problematic.

Grant is spot on here (as usual).

My question is why is it you are so emphatic that you cannot have all databases (or at least the one's that cross-join) on the same server?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004503 More actions · Answer 7

TheSQLGuru (7/13/2010)
Grant Fritchey (7/12/2010)
If you're working with linked servers and running queries against them, you need to know that unless you specifically use a construct like OPENQUERY and pass in parameter values, all data is brought across the wired between the servers and then filtered and processed on the local server. This can lead to some truly horrific performance. While linked servers are available for this type of querying, that's not their primary function. They really should be treated as a mechanism for easy transfer of data and then process queries independently from the linked server. Querying, especially JOINs, etc., across linked servers is notoriously problematic.
Grant is spot on here (as usual).
My question is why is it you are so emphatic that you cannot have all databases (or at least the one's that cross-join) on the same server?

Agreed on all points. I'll go one step further. These types of problems frequently stymie a great number of folks for long periods of time and they sometimes end up trying to build a shortcut that ends up costing them some data.

My recommendation would be to find an independent consultant like Kevin (made the post above) for a couple of days and let him have free access to your systems for a couple of days... folks like him can fix this type of problem pretty quickly and, in the process, will likely teach your people a bit, as well. In the process of doing such a fix, he'll likely do some testing which will also show you some latent performance problems that you probably don't even know you have but will greatly appreciate how fast your systems run after he does a little performance prestidigitation.

Even if you can only afford one day (make sure you have access to the systems setup BEFORE he gets there so no time is wasted), it's likely going to be very worth it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eric Anderson SSC Journeyman Points: 89 More actions · Answer 8

Hi there everyone, I will forward your comments to my management for their consideration. I agree we should get a SQL Guru of some type in here to take a look at things and offer suggestions.

The reason we don't have all of our databases on the same server is because of mandates by various software vendors we are using. "... our company will not provide support on problems reported by the customer if it is found that other databases are placed on the same server as the one on which our databases reside... "

Thanks again everyone... - Eric -

SQLRNNR SSC Guru Points: 281334 More actions · Answer 9

My 2 bits on Vendor support of their databases.

That's a load of crap. Vendors should support their database despite other databases being present. I know they pull that kind of stuff all the time - it really needs to change though.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Roger L Reid SSCrazy Points: 2004 More actions · Answer 10

I'm going to respectfully offer different advice - what you are trying to do is logically and theoretically correct. It wasn't real practical in the past, but may be today: DBA Answer #1: "It Depends".

I hadn't bothered to recheck my assumption that big queries with 4-part qualified names were too slow to bother with until about a year ago - and boy was I surprised.

I've seen seen dramatic improvement in the mature 2005 and in 2008, on multicore x64, when all servers involved are MS SQL Servers. (It has not helped when the LS is not MS SQL).

I'm suspecting it has something to do with the integration of some of Attunity's technology, which uses local caching of statistics from linked servers to help generate efficient distributed queries.

Expression of complex queries do need more old fashioned experimentation and benchmarking to lead the optimizer in the right direction. I tend to write a subquery per server and do the join on the results. The point is to limit the info as much as possible. That leads to some longer, more redundant queries, but they also tend to be easier to understand 6 months later.

The following example is trivial enough that it probably won't matter, but taking the same approach with "real" queries can pay off. BTW this has to be tested and benchmarked; play with it to find the best result. And not all problems are amenable to this.

/* query submitted to SRV2 */

select s1.mdate, s1.mstatus, s2.cli_name

from [SRV1].[CrapVendorDB].[dbo].[ticket] as s1

join [FabulousInHouseDB].[dbo].[Client] as s2

on s1.cli_nbr = s2.cli_nbr

where s2.cli_nbr = ?

/* the above query might be worse performing than the below one */

select s1.mdate, s1.mstatus, s2.cli_name from

FROM (select cli_nbr, cli_name from

[FabulousInHouseDB].[dbo].[Client] )

where s2.cli_nbr = ?) as s2

JOIN (select mdate, mstatus, cli_nbr

from [SRV1].[CrapVendorDB].[dbo].[ticket]

where s1.cli_nbr = ?) as s1

on s1.cli_nbr = s2.cli_nbr

Your goal is to push as much restriction as possible - as OBVIOUSLY as possible - to the linked server. If you have a choice, submit the query to the server that will provide the largest set. The performance of the above can be totally different if submitted to SVR1 natively with SVR2 as the linked.

So this is not yet DRDBMS utopia, but it's one heck of a lot closer than before.

I agree that the "we must stand alone" vendors are distasteful and disingenuous. However, I often isolate the "crap" vendors anyway - if they insist on running XP's, or installing ANYTHING on the dataserver box other than an database, then I want my other db's protected from them.

And most of us can't tell our management "don't buy that product that does just what you want, the people who wrote it are ignorant".

The improvement in linked servers, and better relational syntax in T-SQL such as "except" has transformed my ETL and verification coding. I write queries that were unthinkable just a few years back. They tend to be a clear expression of what I really want to accomplish, but after 25 years of having to avoid the "obvious" expression, it takes some rethinking to get to

insert blah, yada into mydb

select ecch, foo from thatdb

except

select blah,yada from mydb

vs os scripts to

bcp both out to os;

run a diff;

parse result to compose inserts,

run inserts

One foundation of RDBMS (largely ignored) is to not care about how the data are stored (including one instance vs another - as long as you have rights). That's not so easy in practice, but it is much easier today than it was, at least when they are all MS SQL Servers. We missed the boat on DRDBMS probably 15 years ago, and that boat may never sail, but we've got more than we once had.

Roger Reid

Roger L Reid

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

Roger L Reid (7/16/2010)
I'm going to respectfully offer different advice - what you are trying to do is logically and theoretically correct. It wasn't real practical in the past, but may be today: DBA Answer #1: "It Depends".
I hadn't bothered to recheck my assumption that big queries with 4-part qualified names were too slow to bother with until about a year ago - and boy was I surprised.
I've seen seen dramatic improvement in the mature 2005 and in 2008, on multicore x64, when all servers involved are MS SQL Servers. (It has not helped when the LS is not MS SQL).
I'm suspecting it has something to do with the integration of some of Attunity's technology, which uses local caching of statistics from linked servers to help generate efficient distributed queries.
Expression of complex queries do need more old fashioned experimentation and benchmarking to lead the optimizer in the right direction. I tend to write a subquery per server and do the join on the results. The point is to limit the info as much as possible. That leads to some longer, more redundant queries, but they also tend to be easier to understand 6 months later.
The following example is trivial enough that it probably won't matter, but taking the same approach with "real" queries can pay off. BTW this has to be tested and benchmarked; play with it to find the best result. And not all problems are amenable to this.
/* query submitted to SRV2 */
select s1.mdate, s1.mstatus, s2.cli_name
from [SRV1].[CrapVendorDB].[dbo].[ticket] as s1
join [FabulousInHouseDB].[dbo].[Client] as s2
on s1.cli_nbr = s2.cli_nbr
where s2.cli_nbr = ?
/* the above query might be worse performing than the below one */
select s1.mdate, s1.mstatus, s2.cli_name from
FROM (select cli_nbr, cli_name from
[FabulousInHouseDB].[dbo].[Client] )
where s2.cli_nbr = ?) as s2
JOIN (select mdate, mstatus, cli_nbr
from [SRV1].[CrapVendorDB].[dbo].[ticket]
where s1.cli_nbr = ?) as s1
on s1.cli_nbr = s2.cli_nbr
Your goal is to push as much restriction as possible - as OBVIOUSLY as possible - to the linked server. If you have a choice, submit the query to the server that will provide the largest set. The performance of the above can be totally different if submitted to SVR1 natively with SVR2 as the linked.
So this is not yet DRDBMS utopia, but it's one heck of a lot closer than before.
I agree that the "we must stand alone" vendors are distasteful and disingenuous. However, I often isolate the "crap" vendors anyway - if they insist on running XP's, or installing ANYTHING on the dataserver box other than an database, then I want my other db's protected from them.
And most of us can't tell our management "don't buy that product that does just what you want, the people who wrote it are ignorant".
The improvement in linked servers, and better relational syntax in T-SQL such as "except" has transformed my ETL and verification coding. I write queries that were unthinkable just a few years back. They tend to be a clear expression of what I really want to accomplish, but after 25 years of having to avoid the "obvious" expression, it takes some rethinking to get to
insert blah, yada into mydb
select ecch, foo from thatdb
except
select blah,yada from mydb
vs os scripts to
bcp both out to os;
run a diff;
parse result to compose inserts,
run inserts
One foundation of RDBMS (largely ignored) is to not care about how the data are stored (including one instance vs another - as long as you have rights). That's not so easy in practice, but it is much easier today than it was, at least when they are all MS SQL Servers. We missed the boat on DRDBMS probably 15 years ago, and that boat may never sail, but we've got more than we once had.
Roger Reid

Sorry Roger, your example is incorrect (disregarding the extra from in the second part of the first code sample). The optimizer can rearrange joins and push predicates up and down, etc, etc however it wishes as long as it algebraically gives the same results. And both of your first examples will likely get you EXACTLY the same plan - the engine does NOT (necessarily) do SELECTs inside parentheses before outer stuff like your rewrite and statements suggest.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004503 More actions · Answer 12

CirquedeSQLeil (7/15/2010)
My 2 bits on Vendor support of their databases.
That's a load of crap. Vendors should support their database despite other databases being present. I know they pull that kind of stuff all the time - it really needs to change though.

Man, do I ever agree with that. But, it does keep things simple not only for them but for me, as well. I love it when I can call up a vendor and say "Of course it's your code causing the problem... your code is the only thing on the whole ^%*$^#%T@! machine!" 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Roger L Reid SSCrazy Points: 2004 More actions · Answer 13

You are correct in that the example is too simplistic - the compiler will easily find all the plans before running out of time. In fact, there aren't many plans to consider on that one.

The algebra is indeed identical - that's exactly the point.

I have far more complex queries where it makes a huge difference. The compiler is free to make whatever plan it wants, and is almost always smarter than I am.

What is changes is the parse of the SQL. The set described is the same, but the order in which possible plans are evaluated clearly changes, and the "good enough" plan that results can be different. Doesn't have to be.

This may not matter beans to the original poster. I don't know anything about that data or the systems.

BTW, I don't particularly like the idea of playing with syntax to manipulate the compiler. I have way too much code around here that was syntactically optimized for single CPU, 16 bit, 32 MB memory machines - developers don't understand why their code runs WORSE on a 16 way 64-bit with 128 GB memory.

No promises the example is free of error - name of objects and servers etc have been changed so it can be posted. The original can run in one second; minor syntactical changes can make it eternal.

Roger Reid

USE [DatabaseOne]

GO

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

-- static char *SCCS = "@(#) KL_Verify.sql 1.1 06/08/10";

-- Since the data integration uses periodica updates,

-- TRANSIENT differences are to be expected.

-- Instead, the diffs need to be recorded day-to-day.

-- If any differences persists one day to the next, then we report a problem.

-- It's OK to send notice, but be sure it doesn't cause alarm.

-----------------------------------------------------------------

-- WHAT TO CHECK? Everything...but practically, start with this:

-- On 3 sources

-- the SQL Server DatabaseOne/DatabaseTwo copies,

-- the KL_Warehouse staging area,

-- inside the LK database (reverse engineered)

-- Contrast the values of (for custs between 1000 and 99998)

-- mcase status (DatabaseOne(i,r) = lk(i), DatabaseOne(a) = lk(a))

-- mcase list (same list of #'s)

-- close_date (identical)

-- open date (reflect the complex logic)

-- so-called last_invoice_date

-- so-called last_billed_date

-- so-called wip_date

--- DO NOT TAKE THE NAMES OF THOSE DATES AT FACE VALUE.

-- The definitions of those dates are as defined late May 2010 after a review

-- by XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

-- They do not always reflect the name they are reported as.

-----------------------------------------------------------------

-- Add date code to the below, then you can do an except against a

-- previous set of values.

-- Persistent conflicts are a problem

-- SQL SERVER 2005: Run this from the server with the LEFT-most table; other wise the LOJ's will take forever

use DatabaseOne;

CREATE VIEW [dbo].[kl_verify] AS

select c.custno, c.caseno

, c.DatabaseOne_mcase_status

, d.fooco_mcase_status

, l.lk_mcase_status

, [DatabaseOne_last_invoice_date]

, [fooco_last_invoice_date]

, [lk_last_invoice_date]

, [DatabaseOne_last_billed_date]

, [fooco_last_billed_date]

, [lk_last_billed_date]

, [DatabaseOne_wip_date]

, [fooco_wip_date]

, [lk_wip_date]

FROM ( -- Get DatabaseOne

select m.custno

, m.caseno

, CASE

-- If a new status in DatabaseOne (not r,i,a) generates a mismatch,

-- that requires a change in the extraction logic

WHEN m.mcase_status = 'r' THEN 'i'

else m.mcase_status

END collate SQL_Latin1_General_CP1_CI_AS AS [DatabaseOne_mcase_status]

, h.last_invoice_date AS [DatabaseOne_last_invoice_date]

, h.max_bill_date AS [DatabaseOne_last_billed_date]

, h.wip_date AS [DatabaseOne_wip_date]

from [DatabaseOne].[dbo].[Mcase] as m

LEFT OUTER JOIN [DatabaseOne].[dbo].KL_Dates as h

ON h.custno = m.custno and h.caseno = m.caseno

where m.custno between 1000 and 99998

) as c

LEFT OUTER JOIN ( -- Get LK: an LOJ because we need to find missings

select convert(int, m.cust_id) as custno

, convert(int, m.mcase_id) as caseno

, m.mm_status as lk_mcase_status

, i.[mm_invoice_date] as lk_last_invoice_date

, i.[mm_bill_date] as lk_last_billed_date

, i.[mm_wip_date] as lk_wip_date

-- , m.mm_date_open

-- , m.mm_date_closed

FROM [SQL_KL].[kldat].[dbo].[mcase] as m

JOIN [SQL_KL].[kldat].[dbo].[matinfo] as i

on m.[cust_id] = i.[cust_id]

and m.[mcase_id] = i.[mcase_id]

where m.cust_id between '01000' and '99998'

and m.mcase_id between '00001' and '00999'

) AS l

ON l.custno = c.custno and l.caseno = c.caseno

LEFT OUTER JOIN ( -- Get FOOCO: an LOJ because we need to find missings

select convert(int, cust_id) as custno

, convert(int, mcase_id) as caseno

, mcase_status as fooco_mcase_status

, last_invoice_date AS [fooco_last_invoice_date]

, last_billed_date AS [fooco_last_billed_date]

, wip_date AS [fooco_wip_date]

from [SQL_AA\DEPT22].kl_warehouse.dbo.Mcase

where cust_id between '01000' and '99998'

) AS d

ON d.custno = c.custno and d.caseno = c.caseno

where

(c.DatabaseOne_mcase_status <> d.fooco_mcase_status or c.DatabaseOne_mcase_status <> l.lk_mcase_status)

or

(c.DatabaseOne_last_invoice_date <> d.fooco_last_invoice_date or c.DatabaseOne_last_invoice_date <> l.lk_last_invoice_date)

or

(c.DatabaseOne_last_billed_date <> d.fooco_last_billed_date or c.DatabaseOne_last_billed_date <> l.lk_last_billed_date)

or

(c.DatabaseOne_wip_date <> d.fooco_wip_date or c.DatabaseOne_wip_date <> l.lk_wip_date)

Roger L Reid

Jeff Moden SSC Guru Points: 1004503 More actions · Answer 14

Ummmm... cross server joins are quite a bit different than cross database joins. I try to avoid cross server joins whenever possible. In fact, I've gone out of my way to convince folks to move the data just to avoid cross server joins.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)