Performance problem... how can I prevent it?

  • Practicing what I preach, here's the data (creates a million row test table called jbmEmployee)...

    --=============================================================================

    -- Create a temporary test table with a million employees to stress test the

    -- code with. THIS IS NOT PART OF THE SOLUTION!!!!

    --=============================================================================

    --===== If the temp table exists, drop it

    -- IF OBJECT_ID('TempDB..jbmEmployee','U') IS NOT NULL

    -- DROP TABLE jbmEmployee

    --===== Create the temp table

    CREATE TABLE jbmEmployee

    (

    EmployeeID INT PRIMARY KEY CLUSTERED, --don't name PK's on temp tables

    FirstName VARCHAR(30),

    LastName VARCHAR(30)

    )

    ;WITH cteTally AS

    (--==== Generates 1 million test table with predictable names in about 23 seconds

    SELECT TOP 1000000

    ROW_NUMBER() OVER (ORDER BY ac1.Object_ID) AS N

    FROM Master.sys.ALL_Columns ac1

    CROSS JOIN Master.sys.ALL_Columns ac2

    )

    INSERT INTO jbmEmployee

    (EmployeeID, FirstName, LastName)

    SELECT EmployeeID = t.N,

    FirstName = 'FirstName'+RIGHT(REPLICATE('0',10)+CAST(t.N AS VARCHAR(10)),10),

    LastName = 'LastName' +RIGHT(REPLICATE('0',10)+CAST(t.N AS VARCHAR(10)),10)

    FROM cteTally t

    I have the following code which runs just nasty fast because of some wonderful merge-joins when I use the hardcoded values below...

    --===== Find and produce the page

    SELECT e.EmployeeID,e.FirstName,e.LastName

    FROM jbmEmployee e,

    (

    SELECT TOP (100) EmployeeID

    FROM jbmEmployee

    WHERE EmployeeID NOT IN (SELECT TOP (400000)

    EmployeeID

    FROM jbmEmployee

    ORDER BY EmployeeID)

    ORDER BY EmployeeID

    ) d

    WHERE e.EmployeeID = d.EmployeeID

    Returns in the blink of an eye in SSMS.

    Now, when I convert the hard-coded constants to preset variables and run it...

    --===== Set the local variables for pagesize and pagenumber

    -- PageSize and PageNum would be parameters in a stored proc

    SET @PageSize = 100

    SET @PageNum = 4000

    SET @Skip = @PageSize*@PageNum

    --===== Find and produce the page

    SELECT e.EmployeeID,e.FirstName,e.LastName

    FROM jbmEmployee e,

    (

    SELECT TOP (@PageSize) EmployeeID

    FROM jbmEmployee

    WHERE EmployeeID NOT IN (SELECT TOP (@Skip)

    EmployeeID

    FROM jbmEmployee

    ORDER BY EmployeeID)

    ORDER BY EmployeeID

    ) d

    WHERE e.EmployeeID = d.EmployeeID

    ...and it takes about a million years because the optimizer converts the merge-join to a nested-loop for this.

    How can I keep this from happening? Am I going to have to use join hints to make this work or is there a setting I might have accidently set somewhere or ???

    I'm thinking I don't like 2k5 very much if the optimizer is going to do the parameter sniffing thing even on such simple code...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Did you try and build your query as a dynamic SQL. Then Excute it...

  • Oh yeah... and, of course, it works just fine. But, any idea on how to make it work without being dynamic though? Seems the optimizer is making a really poor choice and there's gotta be some way to get it to hammer through this without have to resort to optimizer "tricks".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That is an interesting issue. I can't seem to see a way to force your syntax to be smart about it.

    Of course - my next reaction was - why use a join?

    Declare @fun int

    set @fun=40000;

    --===== Find and produce the page

    SELECT Top 100 e.EmployeeID,e.FirstName,e.LastName

    FROM

    (

    SELECT TOP (@fun+100)

    EmployeeID,FirstName,LastName

    FROM jbmEmployee

    ORDER BY EmployeeID

    ) e

    ORDER by EmployeeID desc

    The same part is - if you hard-code the values - this isn't as performant as your method. But it does blow it away in the "parameterized test"...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Well, good! At least it's not just me! 😀 Thanks for the help, Matt and Michael.

    I also ran into this the other night on something totally different in 2k5... I got around that one by forcing the join hints for the joins that were present during the hard-coding. Seems like 2k5 is starting to make the same poor choices that Oracle sometimes makes 😛

    HEY! If any of you MVP'S are watching this, this would be a REAL good fix for the proposed SP3 for 2k5.

    Anyone know Itzek? Maybe send him my test code and the problem... see what he comes up with... :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For what it's worth - I've found that it's often necessary to blow away your indexing assumptions, and try others. Meaning - it seems that 2005 might prefer an entirely different indexing scheme than 2000....

    Like - a multi-column index might need to be "flipped" (reverse the column order) for 2005 to want to use it. It's to the point now where I've built BOTH when I am not sure - and run for a while to see which one gets all of the action.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Understood... but this doesn't even have anything to do with indexing... changing a constant to a variable causes a massively detrimental change in the execution plan. This is a bit like testing a car on high octane gas and then the system throws vegetable oil in the tank. On the similar problem I had, forcing the join type to what it was during constants caused it to work just as fast with the variables. The optimizer really should be smart enough to know that.

    And this isn't even parameter sniffing because it's a script!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/22/2008)


    Understood... but this doesn't even have anything to do with indexing... changing a constant to a variable causes a massively detrimental change in the execution plan. This is a bit like testing a car on high octane gas and then the system throws vegetable oil in the tank. On the similar problem I had, forcing the join type to what it was during constants caused it to work just as fast with the variables. The optimizer really should be smart enough to know that.

    And this isn't even parameter sniffing because it's a script!

    No disagreement there - just jumping on the "why is the optimizer wrecking my old code" bandwagon...:):w00t:

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Heh... sorry about that, Matt. Thanks for the help...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/22/2008)


    And this isn't even parameter sniffing because it's a script!

    It's caused by the same thing. When the optimiser gets a query, it knows the value of any parameters (stored proc, auto parameterisation, or sp_executeSQL) and the value of any constants. However, it doesn't know the value of the variables. This is because optimisation happens for an entire batch and before anything is executed.

    Imagine something like this

    DECLARE @Var1 DATETIME

    SELECT @Var1 = Somevalue From Sometable where SomeColumn = 3

    SELECT Somecolumns FROM SomeLargeTable WHERE AnotherColumn > @Var1

    At the point where the batch is compiled and optimised, none of the queries have run and the value of @Var1 is unknown.

    Since the optimiser doesn't know the value of the variable, it can't use the statistics to see how many rows would be affected, because it doesn't know what data to check in the stats. Since it doesn't know the number of rows, it makes an estimate. I believe its estimate is 30% of the total rows in the table.

    This can lead to really bad cardinality estimates (If you check your exec plan, you'll probably notice that estimated and actual rows are way different) and hence a very bad plan.

    What I would recommend, rather than forcing the join hint (which is fixing the symptoms and not the root problem) is to use the OPTIMISE FOR hint, to give the optimiser a value to use to estimate cardinality.

    HEY! If any of you MVP'S are watching this, this would be a REAL good fix for the proposed SP3 for 2k5.

    I doubt this is something that can be fixed in a service pack. Would likely need large-scale changes in how (and when) the optimiser compiles a query.

    p.s. If you would like some more details, pages 284-288 of Itzik's book 'Inside SQL Server 2005: T-SQL Programming' covers this, with some diagrams, sample queries and execution plans.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail is right on the mark. I wouldn't necessarily recommend "optimize for" though - you can use statement level recompile (new in 2005) to get the optional query plan using those variable values.

  • matt stockham (2/23/2008)


    Gail is right on the mark. I wouldn't necessarily recommend "optimize for" though - you can use statement level recompile (new in 2005) to get the optional query plan using those variable values.

    Thanks, Matt... do you, by any chance, know what the lookup in BOL is for "statement level recompile" (tried that, didn't work) might be so I can study it? That definitely sounds like a good possibility for known compile problems... better than a trip through Dynamic SQL, for sure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GilaMonster (2/23/2008)


    It's caused by the same thing. When the optimiser gets a query, it knows the value of any parameters (stored proc, auto parameterisation, or sp_executeSQL) and the value of any constants. However, it doesn't know the value of the variables. This is because optimisation happens for an entire batch and before anything is executed.

    That's kinda the conclusion I came to. Thanks for the great explanation, Gail.

    You don't happen to have a good URL for how to overcome parameter sniffing, do you? I like Matt's idea because not all recompiles are bad and it seems to fit perfectly for something like this, but I don't want to limit possibilities... knowing more than one method is always good.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/23/2008)


    matt stockham (2/23/2008)


    Gail is right on the mark. I wouldn't necessarily recommend "optimize for" though - you can use statement level recompile (new in 2005) to get the optional query plan using those variable values.

    Thanks, Matt... do you, by any chance, know what the lookup in BOL is for "statement level recompile" (tried that, didn't work) might be so I can study it? That definitely sounds like a good possibility for known compile problems... better than a trip through Dynamic SQL, for sure.

    Never mind... I found it... OPTION (RECOMPILE)... right?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok folks... the statement level OPTION (RECOMPILE) that Matt spoke of really did the trick both on this current problem and the previous problems I had. Estimated execution plan still shows the nested loop but the actual execution plan shows a nasty fast merge join and the difference in performance is just awesome, as expected.

    The recompile takes less than a ms and, I guess, has the advantage of being able to handle anything rather than optimizing for a set number.

    Awesome stuff. Thanks for the help, everyone!

    Just for reference, here's the final test code including test table generation and the fix using OPTION (RECOMPILE)...

    --=============================================================================

    -- Create a temporary test table with a million employees to stress test the

    -- code with. THIS IS NOT PART OF THE SOLUTION!!!!

    --=============================================================================

    --===== If the temp table exists, drop it

    IF OBJECT_ID('TempDB..jbmEmployee','U') IS NOT NULL

    DROP TABLE jbmEmployee

    --===== Create the temp table

    CREATE TABLE jbmEmployee

    (

    EmployeeID INT PRIMARY KEY CLUSTERED, --don't name PK's on temp tables

    FirstName VARCHAR(30),

    LastName VARCHAR(30)

    )

    ;WITH cteTally AS

    (--==== Generates 1 million test table with predictable names in about 23 seconds

    SELECT TOP 1000000

    ROW_NUMBER() OVER (ORDER BY ac1.Object_ID) AS N

    FROM Master.sys.ALL_Columns ac1

    CROSS JOIN Master.sys.ALL_Columns ac2

    )

    INSERT INTO jbmEmployee

    (EmployeeID, FirstName, LastName)

    SELECT EmployeeID = t.N,

    FirstName = 'FirstName'+RIGHT(REPLICATE('0',10)+CAST(t.N AS VARCHAR(10)),10),

    LastName = 'LastName' +RIGHT(REPLICATE('0',10)+CAST(t.N AS VARCHAR(10)),10)

    FROM cteTally t

    GO

    ---------------------------------------------------------------------------------------------------

    --===== Hard-coding where variables would be produces nasty fast merge join.

    SET STATISTICS TIME ON

    --===== Find and produce the page

    SELECT e.EmployeeID,e.FirstName,e.LastName

    FROM jbmEmployee e,

    (

    SELECT TOP (100) EmployeeID

    FROM jbmEmployee

    WHERE EmployeeID NOT IN (SELECT TOP (400000)

    EmployeeID

    FROM jbmEmployee

    ORDER BY EmployeeID)

    ORDER BY EmployeeID

    ) d

    WHERE e.EmployeeID = d.EmployeeID

    GO

    ---------------------------------------------------------------------------------------------------

    --===== Use of variables in this case cause a bad execution plan. This has the same values as

    -- above, but uses loop joins because of the "parameter sniffing" problem. Adding the

    -- OPTION (RECOMPILE) to the problematic statement with the parameters fixes the problem and

    -- the code runs as fast as it's hard-coded cousin above.

    SET STATISTICS TIME ON

    DECLARE @PageSize INT

    DECLARE @PageNum INT

    DECLARE @Skip INT

    --===== Set the local variables for pagesize and pagenumber

    -- PageSize and PageNum would be parameters in a stored proc

    SET @PageSize = 10

    SET @PageNum = 90000

    SET @Skip = @PageSize*@PageNum

    --===== Find and produce the page

    SELECT e.EmployeeID,e.FirstName,e.LastName

    FROM jbmEmployee e,

    (

    SELECT TOP (@PageSize) EmployeeID

    FROM jbmEmployee

    WHERE EmployeeID NOT IN (SELECT TOP (@Skip)

    EmployeeID

    FROM jbmEmployee

    ORDER BY EmployeeID)

    ORDER BY EmployeeID

    ) d

    WHERE e.EmployeeID = d.EmployeeID

    OPTION (RECOMPILE) --LOOK!!! This overcomes the problem of parameter sniffing

    --Without it, it'll do nested loops instead of a merge join.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply