Strange variable behaviour.....

  • If i run the very simple query below i get a subtree cost of almost 200! If i hardcode the date values into the query it drops to about 10 which is what i'd expect. Is this a bug? The date columns are datetime and i've not encountered too many issues with comparing datetime with varchar in the past and a bit baffled by this.

    DECLARE@dteFrom datetime,

    @dteTo datetime

    SELECT @dteFrom = convert(varchar(10),getdate()-30,103),

    @dteTo = convert(varchar(10),getdate()+30,103)

    SELECT Col1,

    Col2

    FROM IndexedView1 tmcp WITH ( NOEXPAND )

    INNER JOIN Table1 PE ON PE.Col1 = tmcp.Col1

    WHERE ISNULL(pe.Col3, pe.Col4) <= @dteTo

    AND ISNULL(pe.Col3, pe.Col4) >= @dteFrom

    I've reindexed and updated stats but no change.

    Any ideas? Is this a bug?

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Is this the entire query or is a part of a stored proc with input variables?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Swirl80 (5/19/2010)


    If i run the very simple query below i get a subtree cost of almost 200! If i hardcode the date values into the query it drops to about 10 which is what i'd expect. Is this a bug?

    No.

    http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • SQL Server has no way of knowing the actual values of the formula for the parameters when it creates the execution plan, and the rule is that a > or a < (greater than or less than) will cause SQL Server to estimate that the query will return 30% of the total rows in the table. If you use both > and < or BETWEEN SQL Server will estimate 30%*30%=9%.

    Because of this I'm guessing your plan includes a merge or hash join, and that your tables has millions of rows, causing the cost estimate to be high.

    To solve this you can use the query hint OPTIMIZE FOR (<parameters>) (look in BOL) to tell SQL Server what the date range will normally be, which will probably give you a more optimal plan. You can also use the RECOMPILE hint, but this will add extra CPU overhead due to the fact that SQL Server has to generate a plan every time you execute the query.

  • Interesting, i knew about parameter sniffing in terms of a parameter being passed to a procedure but not being able to work out its value, but not when its explicitly set within the query itself. Surely this is a bug?

    I can't use OPTIMIZE FOR as i don't want to hard code a date range in there, my date range is dynamic (+/-30 days) and would like to keep it that way - and you can't put GETDATE() into the optimze for option. Normally i would get around this by just putting the function to get +/-30 days into the actual query itself but the "real" query actually takes its dteFrom and dteTo from 2 columns in a table (yes i've tested it with the table and without the table to get the parameters and get similar results) so can't do this without doing a subquery which again, i'd rather avoid if i can.

    Is this still an issue in 2008?

    thanks for all the input so far 😉

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Swirl80 (5/20/2010)


    Interesting, i knew about parameter sniffing in terms of a parameter being passed to a procedure but not being able to work out its value, but not when its explicitly set within the query itself. Surely this is a bug?

    Not at all, did you read the blog post that I referenced?

    What you've got is not parameter sniffing, you've got the absence of parameter sniffing. The optimiser can only sniff the value of parameters. Since you're using variables, not parameters, the optimiser has no way whatsoever to tell what their values are at compile time (because at compile time they don't have values) and hence it has to use some other methods to guess how many rows are affected.

    When you're dealing with equalities, SQL can use the density of the column (as exposed in the statistics) to make that guess. With an inequality it can't, and guesses that the query will affect 30% of the rows in the table.

    It's me asking you this: "I'm thinking of two surnames. How many entries are there in the telephone directory between the two names?"

    When you change from using variables to constants (values specified within the query), then because the values are available to the optimiser at compile time, it can use the histogram to get a much better estimation of row counts (in this case a lot more accurate one too) and the cost becomes what you would expect.

    Is this still an issue in 2008?

    All versions of the optimiser behave this way.

    Use OPTION(RECOMPILE). Because it's a statement-level recompile, when that recompile happens, the values for the variables are known and the optimiser should be able to get a much better cost estimate. Tradeoff is higher CPU from the frequent compiles.

    The other option is a sub procedure. Pseudocode:

    Create Proc OuterProc AS

    DECLARE @dtStart, @dtEnd...

    SELECT @dtStart = value, @dtEnd = othervalue from SomeTable

    EXEC InnerProc @dtStart, @dtEnd

    GO

    Then the inner proc does the query based on the parameter values passed. Since they are parameters, the optimiser can sniff the values at compile time, hence you'll get a much more optimal plan.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You don't have to use the actual from and to dates in OPTIMIZE FOR, just a date range that would generate a representative set of rows that will somehow match the actual values of the parameters.

    SQL Server will use the values in the OPTIMIZE FOR statement to analyze the statistics for the index, and generate a plan that is more suited for your needs.

    But I urge you to be careful if you chose to use OPTIMIZE FOR. If you delete or update the rows, and the OPTIMIZE FOR values is no longer representative for the actual values then SQL Server will produce a sub-optimal plan, lets say you delete old rows, and based on the statistics there are no rows (or few rows) matching the hint, but your actual parameter values would result in 10s or 100s thousand rows, then you would be in trouble because SQL Server could find out that a nested loop would be good for such few rows (based on OPTIMIZE FOR), even though the number of rows would benefit from a hash match or merge join.

    If you cannot predict good values for OPTIMIZE FOR, then I would suggest you use OPTION (RECOMPILE) and take the overhead in CPU due to recompilation.

  • I may be wrong here, but this:

    ISNULL(pe.Col3, pe.Col4)

    is going to cause scans, either table or index, and you won't get seeks. Functions on columns like this render them non-sargeable (search argument able).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (5/20/2010)


    I may be wrong here, but this:

    ISNULL(pe.Col3, pe.Col4)

    is going to cause scans, either table or index, and you won't get seeks.

    Yeah true. That as well.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/20/2010)


    Grant Fritchey (5/20/2010)


    I may be wrong here, but this:

    ISNULL(pe.Col3, pe.Col4)

    is going to cause scans, either table or index, and you won't get seeks.

    Yeah true. That as well.

    Funnily enough, no it doesn't. Uses an index seek.

    What i've done is added an additional filter to the query saying

    ISNULL(pe.Col3, pe.Col4) >= @dteFrom

    AND ISNULL(pe.Col3, pe.Col4) <= @dteTo

    AND ISNULL(pe.Col3, pe.Col4) >= dateadd(mm,-12,getdate())

    as i know that it will never go beyong 12mths in the past. This brings the execution plan back to how it should be.

    thanks for all the input, greatly appreciated

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Would you mind posting the definition of the table aliased as pe?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Swirl80 (5/20/2010)


    GilaMonster (5/20/2010)


    Grant Fritchey (5/20/2010)


    I may be wrong here, but this:

    ISNULL(pe.Col3, pe.Col4)

    is going to cause scans, either table or index, and you won't get seeks.

    Yeah true. That as well.

    Funnily enough, no it doesn't. Uses an index seek.

    On what index?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Swirl80 (5/20/2010)


    What i've done is added an additional filter to the query saying

    ISNULL(pe.Col3, pe.Col4) >= @dteFrom

    AND ISNULL(pe.Col3, pe.Col4) <= @dteTo

    AND ISNULL(pe.Col3, pe.Col4) >= dateadd(mm,-12,getdate())

    as i know that it will never go beyong 12mths in the past. This brings the execution plan back to how it should be.

    thanks for all the input, greatly appreciated

    That's good to hear.

    From your initial post I was thinking it may be parameter sniffing - hence the first question I asked. But Gail picked up on it and explained it quite well.

    I have also seen a blog post somewhere about the need to pass in more information in a situation like this in order to get it to perform better. Seems like you found that happy medium.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply