TSQL Faster Than Stored Procedure?

  • SQL 2000, SP3A.

    I set up a view into our HR database for another app to query employee information. This view is necessary because the other app is only allowed to see certain information in this highly secure area.

    The TSQL query to return data from this view works in less than one second, but the same TSQL in a stored procedure, with one argument, takes 12 seconds to run, every time. Adding recompile to the procedure did not make a difference, neither did replacing the argument with a new declared var inside the procedure, as Maddogs suggested in another thread.

    View Source:

    CREATE VIEW dbo.vwlaEmployeeListVPS

    AS

    /*

    For VPS to obtain a list of employees.

    Version 1.01, 12-14-2006, Chris Stamey

    */

    SELECT     TOP 100 PERCENT  EB.EbFlxID As EmployeeListID, EB.EbClock AS EmpNumber, EB.EbPSID As PSID, EB.EbFirstName As FirstName,

    EB.EbLastName As LastName, EP.EpEMail As EmailAddress, EJ.EjWorkPhone As WorkPhone,  EJ.EjDivision AS UnitNumber, EE.EeStatus AS EmploymentStatus,

    EJ.EjRetired AS CompLevelCode, EE.EeTermDate AS TerminationDate, EJ.ejTitle As JobTitle,

                          EJ.EjSupervisorFlxIDeb AS SupervisorFlxID

    FROM         dbo.EBase AS EB INNER JOIN

                          dbo.EEmploy AS EE ON EB.EbFlxID = EE.EeFlxIDEb INNER JOIN

                          dbo.EJob AS EJ ON EB.EbFlxID = EJ.EjFlxIDEb INNER JOIN

                          dbo.EPerson AS EP ON EB.EbFlxID = EP.EpFlxIDeb

    WHERE     (EP.EpDateBeg =

                              (SELECT     MAX(EpDateBeg) AS Expr1

                                FROM          dbo.EPerson

                                WHERE      (EpFlxIDEb = EB.EbFlxID))) AND (EJ.EjDateBeg =

                              (SELECT     MAX(EjDateBeg) AS Expr1

                                FROM          dbo.EJob

                                WHERE      (EjFlxIDEb = EB.EbFlxID))) AND (EE.EeDateBeg =

                              (SELECT     MAX(EeDateBeg) AS Expr1

                                FROM          dbo.EEmploy

                                WHERE      (EeFlxIDEb = EB.EbFlxID))) AND (EE.EeTermDate > '8/1/2006' OR

                          EE.EeTermDate IS NULL)

    ORDER BY EB.EbLastName, EB.EbFirstName

    TSQL that runs in 1 second:

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS  WHERE EmploymentStatus = 'Active' and EmailAddress = 'psridhar@landam.com'

    Procedure (runs in 12 seconds):

    VPA_UserInfo_Retrieve  'psridhar@landam.com'

    Procedure Source:

    CREATE PROCEDURE [dbo].[VPA_UserInfo_Retrieve]

     @NetworkID NVARCHAR(100) With recompile

    AS

    BEGIN

     SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS  WHERE EmploymentStatus = 'Active' and EmailAddress =  @NetworkID

    END

    GO

    I have noticed if I pass the email address as a parameter to the TSQL, it slows down tremendously also (12 seconds):

    Declare @NetworkID NVARCHAR(100)

    Set @NetworkID = 'psridhar@landam.com'

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS  WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID

    When I looked at the query plans I saw that the indexes were used/not used differently. When it runs fast the one index it uses, and says was 99% of the execution time, says it returns 1 row. In the slow versions it says it uses the same index in 2 places, and each of those is 23% of the execution time and the rows returned are over 228,000. There is another index involved that has rows returned of 148,000, too.

    I don't understand why using and argument/variable instead of a literal can make that much difference in a query plan. Is SQL Server that dumb? What am I missing?

    Thanks,

    Chris

  • What is the datatype of EmailAddress? varchar(100) or nvarchar(100)?

     

  • The epEmail field is VarChar(255), in the table behind the view. I have tried it both ways, nvarchar and varchar, and that doesn't make much difference.

    12 seconds:

    Declare @NetworkID VARCHAR(255)

    Set @NetworkID = 'psridhar@landam.com'

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS  WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID

    9 seconds:

    Declare @NetworkID NVARCHAR(255)

    Set @NetworkID = 'psridhar@landam.com'

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS  WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID

    9 seconds:

    Declare @NetworkID NVARCHAR(100)

    Set @NetworkID = 'psridhar@landam.com'

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS  WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID

    12 seconds:

    Declare @NetworkID VARCHAR(100)

    Set @NetworkID = 'psridhar@landam.com'

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS  WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID

    Thanks,

    Chris

     

     

  • @NetworkId should be varchar(255) to avoid datatype precedence issues.

    Putting order into a view is not a good idea. Try removing the TOP 100 PERCENT and ORDER BY from the view. An ORDER BY can then be added to the query.

     

     

  • OK, done. Removed ordering and the Top 100 percent from the view. Ran the following, but still 12 seconds:

    Declare @NetworkID VARCHAR(255)

    Set @NetworkID = 'psridhar@landam.com'

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS  WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID

     

    Thanks,

    Chris

  • Sorry, that exhausts my suggestions.

     

  • Hi ,

    First Clear ur procedure cache...

    using THIS STATEMENT...

    DBCC FREEPROCCACHE

    and then re run ur view.. if it is fine then ur network who acess the database will be slower.

    And its true that an application takes lots of time to acess data from a database??????

    Regards,

    Amit G.

  • Ummm.... take the ORDER BY out of the view... you should do the order by in procs that reference the view, not in the view itself.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Amit, using that statement to clear the proc cache made the TSQL much worse. The same thing, TSQL, with variables passed to it, took 26 seconds to run! The stored procedure execution time did not change. The TSQL with the literal arguments still runs in 1 second.

    The view in question takes 10 seconds to return 16,500 rows, if no criteria is specified, and is no slower if the Order By is used. As can be seen by the times I am getting with this query, the criteria make it slower when passed in as a variable, returning fewer records.

    Jeff, already removed the order by from the view, as per suggestion by Ken.

    Thanks for the ideas guys,

    Chris

  • Two more thoughts for the view:

    1. If you have indexes on the Foreign Keys, you could try explicitly using INNER MERGE JOIN instead of INNER JOIN.

    2. I tend to avoid correlated sub-queries as they can be slow. Try joining to derived tables instead.

     

  • Believe it or not, Chris, all database engines are "that dumb".  I first encountered this phenomenon over 5 years ago and thought the same thing, "Is SQL Server really that stupid?"  I never really got an answer to my question, but learned to accept it.

    Then, a few months later I was in a meeting with a bunch of developers from other departments and one person was talking on the side about a simple query he had, that when executed with a variable ran significantly slower than when executed with a hard coded value.  He was using IBM DB2 database.

    Another year went by and I switched jobs to a company that used Oracle.  Guess what, Oracle has the same issue.

    So really, to answer your question, no, SQL Server isn't stupid.  It has to do with the optimizer and what information it has when it compiles your statement.  Loosly speaking, it has to do with cardinality and other statistics information.  When a SQL statement is compiled and an execution plan is made, all avaiable information is looked at to determine the most optimum plan.  If you have a variable in the statement, then obviously the compiler cannot know the value at compile time, so it takes a more general approach at the execution plan.

    The easiest example would be this.  Suppose you have a table and an index on a boolean column.  However, the table is heavily weighted such that 99% of the rows have a value of 'true' for that boolean column and just 1% of the rows have a value of 'false'.

    If you put together a SELECT statement like the following:

    SELECT * FROM boolTable WHERE boolColumn = false;

    When the optimizer compiles this statement, it will know upfront (assuming recents statistics on the table) that the table is heavily weighted and that using the index on the column to select all rows with a value of 'false' is the most efficient.

    Now, let's say you change the query and have a hard coded 'true' value in there:

    SELECT * FROM boolTable WHERE boolColumn = true;

    Again, the optimizer knows everything upfront and this time, it will elect to do a full table scan because it knows it is going to return 99% of the rows, so a full table scan will be faster than using the index.

    Lastly, if you changed the statement to use a variable:

    SELECT * FROM boolTable WHERE boolColumn = @boolValue;

    The optimizer doesn't know what best to do, so it will take a general approach, which may or may not be the 'right' one.

    Hopefully that kind of clears it up for you.

  • Kevin77, that kind of makes sense, but wouldn't the optimizer, since it knows what the field is I am going to be comparing to, even though it doesn't yet know the value, say that this field has all different values in it, but I know how I will handle it?

    In this case it's an email address field.

    Ken, are you saying I should joiin on my sub-queries, in such a manner as this:

    From Table T

    Inner Join (Select SubField From Table Where OtherField Is Null) S On T.Field = S.SubField

    I have seen this done, but wondered about its efficiency.

    Now, an update. My MS guy was here and noticed that on the slow query it are several instances, in the execution plan, of parallelism going on so he suggested putting in the query option Option(Maxdop 1). This sped it up to 5 seconds on the first run and 1 secodn on a subsequent run, but the TSQL with literal where clause still beats it. I have a case open with MS now, and hope to track this problem down definitively.

    Thanks for the help,

    Chris

  • Using a view in a stored proc that is then filtered based on a parameter value from the stored proc "encourages" SQL Server to generalize the solution - ie fully instantiated the view, then cache it.  The first time you hit it, if it doesn't have a cached version of the view the optimizer feels it can safely use, then you will get a significant performance hit.

    A simple way to avoid this is to avoid using views in stored procs.  For example,

    CREATE PROCEDURE dbo.EmpInfo @NetworkID VARCHAR(255)

    AS

    SELECT

      EB.EbFirstName As FirstName

    , EB.EbLastName As LastName

    , EB.EbClock AS EmpNumber

    , EJ.EjDivision AS UnitNumber

    , EP.EpEMail As EmailAddress

    , EJ.EjWorkPhone As PhoneNumber

    FROM dbo.EBase AS EB

    JOIN dbo.EEmploy AS EE

      ON EB.EbFlxID = EE.EeFlxIDEb

    JOIN dbo.EJob AS EJ

      ON EB.EbFlxID = EJ.EjFlxIDEb

    JOIN dbo.EPerson AS EP

      ON EB.EbFlxID = EP.EpFlxIDeb

    WHERE EP.EpEMail = @NetworkID

      AND EE.EeStatus = 'Active'

      AND COALESCE(EE.EeTermDate,'9999-12-31') > '8/1/2006')

      AND EP.EpDateBeg =

      (SELECT MAX(EpDateBeg) AS Expr1

       FROM dbo.EPerson

       WHERE EpFlxIDEb = EB.EbFlxID)

      AND EJ.EjDateBeg =

      (SELECT MAX(EjDateBeg) AS Expr1

       FROM dbo.EJob

       WHERE EjFlxIDEb = EB.EbFlxID)

      AND EE.EeDateBeg =

      (SELECT MAX(EeDateBeg) AS Expr1

       FROM dbo.EEmploy

       WHERE EeFlxIDEb = EB.EbFlxID)

    SQL Server will then optimize performance more intelligently, because it doesn't have to populate a view, then throw all but one record away.  This is the kind of performance you see when you pass a fixed (non variable) value.  Interestingly, when you do this directly from a T-SQL query against a view, the optimizer is smart enough not to instantiate the complete view, but attempts to save work by reducing instantiating the number of rows AND columns in the view to the ones your query wants.  Imperfect intelligence sometimes, in that it does more work than needed, but I'm guessing this is why your query performs so well in T-SQL with a fixed value, but not when using a variable.

     

  • Thanks for the insite Malcom. In this case, for security purposes, I must use the view, as the developers and the application cannot have access to other fields in the view tables. I am not opposed to optmizing the view more though. I am experimenting with that now.

    Thanks,

    Chris

  • Kevin, taking your suggestion and putting the sub-queries into joins, I sped up the return of all records, 16,500, by 25%, Initial 8 seconds and subsequent 6 seconds, vs the old 10 second and subsequent 8 seconds. Query used: Select * from vwlaEmployeeListVPS

    With the other query, I found no difference in the return times. Query used:

    Declare @NetworkID VARCHAR(255)

    Select @NetworkID = 'psridhar@landam.com'

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS2  WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID

    Now, using either view along with these query optimizers, seen below, I got the speed down to 3 seconds, under repeated runs. Query used:

    Declare @NetworkID VARCHAR(255)

    Select @NetworkID = 'psridhar@landam.com'

    SELECT FirstName as FirstName,LastName as LastName,EmpNumber as UserID,UnitNumber as UnitNumber,EmailAddress as EmailAddress

        ,WorkPhone as PhoneNumber FROM VHRDevTest.dbo.vwlaEmployeeListVPS2  WHERE EmploymentStatus = 'Active' and EmailAddress = @NetworkID

    Option(Maxdop 1, FAST 1)

    I just found something that makes it even faster, using FAST 100 instead of FAST 1. This increased the speed to 2 seconds rather than 3. I tried it with both views and got the same performance increase.

    Thanks,

    Chris

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply