February 27, 2007 at 8:02 am
SQL 2000, SP3A.
I set up a view into our HR database for another app to query employee information. This view is necessary because the other app is only allowed to see certain information in this highly secure area.
The TSQL query to return data from this view works in less than one second, but the same TSQL in a stored procedure, with one argument, takes 12 seconds to run, every time. Adding recompile to the procedure did not make a difference, neither did replacing the argument with a new declared var inside the procedure, as Maddogs suggested in another thread.
View Source:
CREATE VIEW dbo.vwlaEmployeeListVPS
AS
/*
For VPS to obtain a list of employees.
Version 1.01, 12-14-2006, Chris Stamey
*/
SELECT TOP 100 PERCENT EB.EbFlxID As EmployeeListID, EB.EbClock AS EmpNumber, EB.EbPSID As PSID, EB.EbFirstName As FirstName,
EB.EbLastName As LastName, EP.EpEMail As EmailAddress, EJ.EjWorkPhone As WorkPhone, EJ.EjDivision AS UnitNumber, EE.EeStatus AS EmploymentStatus,
EJ.EjRetired AS CompLevelCode, EE.EeTermDate AS TerminationDate, EJ.ejTitle As JobTitle,
EJ.EjSupervisorFlxIDeb AS SupervisorFlxID
FROM dbo.EBase AS EB INNER JOIN
dbo.EEmploy AS EE ON EB.EbFlxID = EE.EeFlxIDEb INNER JOIN
dbo.EJob AS EJ ON EB.EbFlxID = EJ.EjFlxIDEb INNER JOIN
dbo.EPerson AS EP ON EB.EbFlxID = EP.EpFlxIDeb
WHERE (EP.EpDateBeg =
(SELECT MAX(EpDateBeg) AS Expr1
FROM dbo.EPerson
WHERE (EpFlxIDEb = EB.EbFlxID))) AND (EJ.EjDateBeg =
(SELECT MAX(EjDateBeg) AS Expr1
FROM dbo.EJob
WHERE (EjFlxIDEb = EB.EbFlxID))) AND (EE.EeDateBeg =
(SELECT MAX(EeDateBeg) AS Expr1
FROM dbo.EEmploy
WHERE (EeFlxIDEb = EB.EbFlxID))) AND (EE.EeTermDate > '8/1/2006' OR
EE.EeTermDate IS NULL)
ORDER BY EB.EbLastName, EB.EbFirstName
TSQL that runs in 1 second:
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS WHERE EmploymentStatus = 'Active' and EmailAddress = 'psridhar@landam.com'
Procedure (runs in 12 seconds):
VPA_UserInfo_Retrieve 'psridhar@landam.com'
Procedure Source:
CREATE PROCEDURE [dbo].[VPA_UserInfo_Retrieve]
@NetworkID NVARCHAR(100) With recompile
AS
BEGIN
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
END
GO
I have noticed if I pass the email address as a parameter to the TSQL, it slows down tremendously also (12 seconds):
Declare @NetworkID NVARCHAR(100)
Set @NetworkID = 'psridhar@landam.com'
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
When I looked at the query plans I saw that the indexes were used/not used differently. When it runs fast the one index it uses, and says was 99% of the execution time, says it returns 1 row. In the slow versions it says it uses the same index in 2 places, and each of those is 23% of the execution time and the rows returned are over 228,000. There is another index involved that has rows returned of 148,000, too.
I don't understand why using and argument/variable instead of a literal can make that much difference in a query plan. Is SQL Server that dumb? What am I missing?
Thanks,
Chris
February 27, 2007 at 8:13 am
What is the datatype of EmailAddress? varchar(100) or nvarchar(100)?
February 27, 2007 at 8:26 am
The epEmail field is VarChar(255), in the table behind the view. I have tried it both ways, nvarchar and varchar, and that doesn't make much difference.
12 seconds:
Declare @NetworkID VARCHAR(255)
Set @NetworkID = 'psridhar@landam.com'
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
9 seconds:
Declare @NetworkID NVARCHAR(255)
Set @NetworkID = 'psridhar@landam.com'
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
9 seconds:
Declare @NetworkID NVARCHAR(100)
Set @NetworkID = 'psridhar@landam.com'
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
12 seconds:
Declare @NetworkID VARCHAR(100)
Set @NetworkID = 'psridhar@landam.com'
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
Thanks,
Chris
February 27, 2007 at 8:35 am
@NetworkId should be varchar(255) to avoid datatype precedence issues.
Putting order into a view is not a good idea. Try removing the TOP 100 PERCENT and ORDER BY from the view. An ORDER BY can then be added to the query.
February 27, 2007 at 8:40 am
OK, done. Removed ordering and the Top 100 percent from the view. Ran the following, but still 12 seconds:
Declare @NetworkID VARCHAR(255)
Set @NetworkID = 'psridhar@landam.com'
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
Thanks,
Chris
February 27, 2007 at 8:51 am
Sorry, that exhausts my suggestions.
February 28, 2007 at 4:15 am
Hi ,
First Clear ur procedure cache...
using THIS STATEMENT...
DBCC FREEPROCCACHE
and then re run ur view.. if it is fine then ur network who acess the database will be slower.
And its true that an application takes lots of time to acess data from a database??????
Regards,
Amit G.
February 28, 2007 at 7:08 am
Ummm.... take the ORDER BY out of the view... you should do the order by in procs that reference the view, not in the view itself.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 28, 2007 at 8:06 am
Amit, using that statement to clear the proc cache made the TSQL much worse. The same thing, TSQL, with variables passed to it, took 26 seconds to run! The stored procedure execution time did not change. The TSQL with the literal arguments still runs in 1 second.
The view in question takes 10 seconds to return 16,500 rows, if no criteria is specified, and is no slower if the Order By is used. As can be seen by the times I am getting with this query, the criteria make it slower when passed in as a variable, returning fewer records.
Jeff, already removed the order by from the view, as per suggestion by Ken.
Thanks for the ideas guys,
Chris
February 28, 2007 at 11:24 am
Two more thoughts for the view:
1. If you have indexes on the Foreign Keys, you could try explicitly using INNER MERGE JOIN instead of INNER JOIN.
2. I tend to avoid correlated sub-queries as they can be slow. Try joining to derived tables instead.
February 28, 2007 at 11:53 am
Believe it or not, Chris, all database engines are "that dumb". I first encountered this phenomenon over 5 years ago and thought the same thing, "Is SQL Server really that stupid?" I never really got an answer to my question, but learned to accept it.
Then, a few months later I was in a meeting with a bunch of developers from other departments and one person was talking on the side about a simple query he had, that when executed with a variable ran significantly slower than when executed with a hard coded value. He was using IBM DB2 database.
Another year went by and I switched jobs to a company that used Oracle. Guess what, Oracle has the same issue.
So really, to answer your question, no, SQL Server isn't stupid. It has to do with the optimizer and what information it has when it compiles your statement. Loosly speaking, it has to do with cardinality and other statistics information. When a SQL statement is compiled and an execution plan is made, all avaiable information is looked at to determine the most optimum plan. If you have a variable in the statement, then obviously the compiler cannot know the value at compile time, so it takes a more general approach at the execution plan.
The easiest example would be this. Suppose you have a table and an index on a boolean column. However, the table is heavily weighted such that 99% of the rows have a value of 'true' for that boolean column and just 1% of the rows have a value of 'false'.
If you put together a SELECT statement like the following:
SELECT * FROM boolTable WHERE boolColumn = false;
When the optimizer compiles this statement, it will know upfront (assuming recents statistics on the table) that the table is heavily weighted and that using the index on the column to select all rows with a value of 'false' is the most efficient.
Now, let's say you change the query and have a hard coded 'true' value in there:
SELECT * FROM boolTable WHERE boolColumn = true;
Again, the optimizer knows everything upfront and this time, it will elect to do a full table scan because it knows it is going to return 99% of the rows, so a full table scan will be faster than using the index.
Lastly, if you changed the statement to use a variable:
SELECT * FROM boolTable WHERE boolColumn = @boolValue;
The optimizer doesn't know what best to do, so it will take a general approach, which may or may not be the 'right' one.
Hopefully that kind of clears it up for you.
February 28, 2007 at 12:47 pm
Kevin77, that kind of makes sense, but wouldn't the optimizer, since it knows what the field is I am going to be comparing to, even though it doesn't yet know the value, say that this field has all different values in it, but I know how I will handle it?
In this case it's an email address field.
Ken, are you saying I should joiin on my sub-queries, in such a manner as this:
From Table T
Inner Join (Select SubField From Table Where OtherField Is Null) S On T.Field = S.SubField
I have seen this done, but wondered about its efficiency.
Now, an update. My MS guy was here and noticed that on the slow query it are several instances, in the execution plan, of parallelism going on so he suggested putting in the query option Option(Maxdop 1). This sped it up to 5 seconds on the first run and 1 secodn on a subsequent run, but the TSQL with literal where clause still beats it. I have a case open with MS now, and hope to track this problem down definitively.
Thanks for the help,
Chris
February 28, 2007 at 12:48 pm
Using a view in a stored proc that is then filtered based on a parameter value from the stored proc "encourages" SQL Server to generalize the solution - ie fully instantiated the view, then cache it. The first time you hit it, if it doesn't have a cached version of the view the optimizer feels it can safely use, then you will get a significant performance hit.
A simple way to avoid this is to avoid using views in stored procs. For example,
CREATE PROCEDURE dbo.EmpInfo @NetworkID VARCHAR(255)
AS
SELECT
EB.EbFirstName As FirstName
, EB.EbLastName As LastName
, EB.EbClock AS EmpNumber
, EJ.EjDivision AS UnitNumber
, EP.EpEMail As EmailAddress
, EJ.EjWorkPhone As PhoneNumber
FROM dbo.EBase AS EB
JOIN dbo.EEmploy AS EE
ON EB.EbFlxID = EE.EeFlxIDEb
JOIN dbo.EJob AS EJ
ON EB.EbFlxID = EJ.EjFlxIDEb
JOIN dbo.EPerson AS EP
ON EB.EbFlxID = EP.EpFlxIDeb
WHERE EP.EpEMail = @NetworkID
AND EE.EeStatus = 'Active'
AND COALESCE(EE.EeTermDate,'9999-12-31') > '8/1/2006')
AND EP.EpDateBeg =
(SELECT MAX(EpDateBeg) AS Expr1
FROM dbo.EPerson
WHERE EpFlxIDEb = EB.EbFlxID)
AND EJ.EjDateBeg =
(SELECT MAX(EjDateBeg) AS Expr1
FROM dbo.EJob
WHERE EjFlxIDEb = EB.EbFlxID)
AND EE.EeDateBeg =
(SELECT MAX(EeDateBeg) AS Expr1
FROM dbo.EEmploy
WHERE EeFlxIDEb = EB.EbFlxID)
SQL Server will then optimize performance more intelligently, because it doesn't have to populate a view, then throw all but one record away. This is the kind of performance you see when you pass a fixed (non variable) value. Interestingly, when you do this directly from a T-SQL query against a view, the optimizer is smart enough not to instantiate the complete view, but attempts to save work by reducing instantiating the number of rows AND columns in the view to the ones your query wants. Imperfect intelligence sometimes, in that it does more work than needed, but I'm guessing this is why your query performs so well in T-SQL with a fixed value, but not when using a variable.
February 28, 2007 at 12:59 pm
Thanks for the insite Malcom. In this case, for security purposes, I must use the view, as the developers and the application cannot have access to other fields in the view tables. I am not opposed to optmizing the view more though. I am experimenting with that now.
Thanks,
Chris
February 28, 2007 at 2:12 pm
Kevin, taking your suggestion and putting the sub-queries into joins, I sped up the return of all records, 16,500, by 25%, Initial 8 seconds and subsequent 6 seconds, vs the old 10 second and subsequent 8 seconds. Query used: Select * from vwlaEmployeeListVPS
With the other query, I found no difference in the return times. Query used:
Declare @NetworkID VARCHAR(255)
Select @NetworkID = 'psridhar@landam.com'
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS2 WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
Now, using either view along with these query optimizers, seen below, I got the speed down to 3 seconds, under repeated runs. Query used:
Declare @NetworkID VARCHAR(255)
Select @NetworkID = 'psridhar@landam.com'
SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress
,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS2 WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID
Option(Maxdop 1, FAST 1)
I just found something that makes it even faster, using FAST 100 instead of FAST 1. This increased the speed to 2 seconds rather than 3. I tried it with both views and got the same performance increase.
Thanks,
Chris
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply